# Gibson assembly implementation

Currently from conversations with Fred, it seems like Gibson assembly is the prudent route for building plasmids. Accordingly, I want to implement automated protocol design into the pipeline. While I think that when actually designing variable regions an extra complemtary region will be added to avoid having to create primers for each
region the concepts will be the same.

Here I am following along with the [pyDNA gibson assembly tutorial](https://github.com/BjornFJohansson/pydna-examples/blob/master/notebooks/gibson/gibson.ipynb) using pMAL-hRNASEH1.

![](files/pmal.png)

## Gibson assembly background and in context

[AddGene gibson assembly](https://www.addgene.org/protocols/gibson-assembly/)

![](https://media.addgene.org/data/easy-thumbnails/filer_public/cms/filer_public/15/c4/15c45cf9-3d03-4f61-93e9-c39159f6916e/gibson_assembly_overview_1.jpg__700x351_q85_crop_subsampling-2_upscale.png)

Here linearized receipient plasmid would be some backbone we want to use to clone variable regions into. The `PCR Product / DNA fragment` would be the variable region. Under this workflow we would want to deseign in the A and B complementary regions into each fragment I think. This would avoid having to use primers for each VR. 

This region would then be dependent on the plasmid backbone that is selected and the location the insert is made. Additionally this method seems best for single fragment insert. In reality we would want to insert both initiatitor and terminator sequences at the same time ideally with promotors.

```
----| Promotor -> |--| Initiator |--| Extension region | -- | Terminator | ------ | Extension region | -- | <- Promotor | 
```

[OpenWetWare](https://openwetware.org/wiki/Janet_B._Matsen:Guide_to_Gibson_Assembly)

![](https://s3-us-west-2.amazonaws.com/oww-files-thumb/7/7b/Gibson_overview_cartoon_JM.png/900px-Gibson_overview_cartoon_JM.png)

## Make sure pyDNA is available

In [1]:
import os
from pydna.readers import read

## Read backbone plasmid

In [2]:
backbone = 'files/pMAL_RH1.gb'
assert os.path.isfile(backbone)
pMal = read(backbone)
pMal

Get plasmid features (workflow should require plasmid backbone in gb format)

In [3]:
pMal.list_features()

| Ft# | Label or Note    | Dir | Sta  | End  |  Len | type         | orf? |
|-----|------------------|-----|------|------|------|--------------|------|
|   0 | nd               | --> | 0    | 7475 | 7475 | source       |  no  |
|   1 | L:lacI           | --> | 80   | 1163 | 1083 | CDS          |  no  |
|   2 | L:tac promoter t | --> | 1405 | 1433 |   28 | promoter     |  no  |
|   3 | L:MBP            | --> | 1527 | 2689 | 1162 | CDS          |  no  |
|   4 | L:Factor Xa site | --> | 2676 | 2688 |   12 | misc_signal  |  no  |
|   5 | L:RNaseH1 (-MLS) | --> | 2700 | 3516 |  816 | misc_feature | yes  |
|   6 | L:HA tag HA\tag  | --> | 2706 | 2733 |   27 | misc_feature |  no  |
|   7 | L:lacZ alpha lac | --> | 3563 | 3745 |  182 | CDS          |  no  |
|   8 | L:AmpR           | --> | 4246 | 5107 |  861 | CDS          | yes  |
|   9 | L:- M13 ori + -\ | <-- | 5148 | 5662 |  514 | rep_origin   |  no  |
|  10 | L:ori            | --> | 5772 | 6361 |  589 | rep_origin   |  no  |
|  11 | L:rop            | <-- | 6790 | 6982 |  192 | misc_feature |  no  |

Lets say we want to place the variable region where `L:RNaseH1 (-MLS)` currently is.

In [4]:
RNAseH = pMal.extract_feature(5)
RNAseH

Dseqrecord(-816)

## Read simulated variable region

For the variable region we want to insert I am using the `cyc.gb` file provided in the gibson assembly example. This genbank file only contains one feature, an ORF. When implementing will need to convert variable region fasta files into this format.

Contents of the file shown below.

```

LOCUS       CYC1                     330 bp ds-DNA     linear       07-FEB-2017
DEFINITION  .
ACCESSION   
VERSION     
SOURCE      .
  ORGANISM  .
COMMENT     
COMMENT     ApEinfo:methylated:1
ORIGIN
        1 ATGACTGAAT TCAAGGCCGG TTCTGCTAAG AAAGGTGCTA CACTTTTCAA GACTAGATGT
       61 CTACAATGCC ACACCGTGGA AAAGGGTGGC CCACATAAGG TTGGTCCAAA CTTGCATGGT
      121 ATCTTTGGCA GACACTCTGG TCAAGCTGAA GGGTATTCGT ACACAGATGC CAATATCAAG
      181 AAAAACGTGT TGTGGGACGA AAATAACATG TCAGAGTACT TGACTAACCC AAAGAAATAT
      241 ATTCCTGGTA CCAAGATGGC CTTTGGTGGG TTGAAGAAGG AAAAAGACAG AAACGACTTA
      301 ATTACCTACT TGAAAAAAGC CTGTGAGTAA   
//
```

In [5]:
cyc1_path = 'files/cyc1.gb'
cyc1 = read(cyc1_path)

## Linearize backbone

In [6]:
from Bio.Restriction import SwaI

linear_backbone = pMal.linearize(SwaI)
linear_backbone

Dseqrecord(-7475)

## Design primers for Cyc1 (variable region)

This would be needed if we wanted to amplify the variable region prior to cloning to increase concentration

In [7]:
from pydna.design import primer_design
from pydna.design import assembly_fragments

In [8]:
cyc1_amplicon = primer_design(cyc1)
cyc1_amplicon.figure()

5ATGACTGAATTCAAGGCC...TGAAAAAAGCCTGTGAGTAA3
                      ||||||||||||||||||||
                     3ACTTTTTTCGGACACTCATT5
5ATGACTGAATTCAAGGCC3
 ||||||||||||||||||
3TACTGACTTAAGTTCCGG...ACTTTTTTCGGACACTCATT5

In [9]:
fragment_list = assembly_fragments(
    (linear_backbone, cyc1_amplicon, linear_backbone)
)

Linear backbone appears at front and end of the fragment list because want the final construct to be cicular.

In [10]:
fragment_list

[Dseqrecord(-7475), Amplicon(400), Dseqrecord(-7475)]

In [11]:
fragment_list[1].figure()

                                   5ATGACTGAATTCAAGGCC...TGAAAAAAGCCTGTGAGTAA3
                                                         ||||||||||||||||||||
                                                        3ACTTTTTTCGGACACTCATTTTTAACATTTGCAATTATAAAACAATTTTAAGCGC5
5CCCCAAAAACAGGAAGATTGTATAAGCAAATATTTATGACTGAATTCAAGGCC3
                                    ||||||||||||||||||
                                   3TACTGACTTAAGTTCCGG...ACTTTTTTCGGACACTCATT5

In [12]:
from pydna.assembly import Assembly

In [13]:
fragment_list = fragment_list[:-1]

In [14]:
fragment_list

[Dseqrecord(-7475), Amplicon(400)]

In [15]:
asm = Assembly(fragment_list)
asm

Assembly
fragments..: 7475bp 400bp
limit(bp)..: 25
G.nodes....: 4
algorithm..: common_sub_strings

In [16]:
canidate = asm.assemble_circular()[0]
canidate

Not exactly sure what we have just created here / where the variable region ended up in the final assembly.

In [17]:
test_assembly = canidate
canidate.write('files/test_assembly.gb')

Visualize in snapgene

![](files/test.1.png)

The insert was 400 bp so it was def added between lacZ alpha and AmpR gene but is not labeled and has taken out the restriction site. Seems like linearizing the plasmid at that point clones in the variable region at that location. Not really want I want to try and do.

## Round 2: Chaning plasmid backbones and going back to design

Switching to more realistic plasmid choice [pFC8](files/pFC8.gb). Not sure why I thought pMal was the way to go.

![](files/pFC8.png)

Lets say just for simplicity we are going to leave in the R-loop forming region, could cut this out to reduce overall length, and use the T3 promotor. Then want to insert the construct (variable region) inbetween `SNRPN` and the `T3` promotor by linearizing at `KpnI` site.

Should note that this will create sticky ends, that might matter since in this approach not using PCR to get homolgous regions between plasmid and variable region. Actually should not because ends would be chewed by the exonuclease.

Sequence at `KpnI` cut site.

![](files/linearization.png)

In [18]:
from Bio.Restriction import KpnI, Analysis, RestrictionBatch
from Bio.Seq import Seq

rb = RestrictionBatch([KpnI])

Sequence overlapping most of region shown above (including `KpnI` site)

In [19]:
s = Seq('tccaagacctcgagggggggcccggtacccagcttttgttccctttagtgagg')
Analong = Analysis(rb, s)

In [20]:
Analong.print_as('map')
Analong.print_that()

                            29 KpnI
                            |                               
tccaagacctcgagggggggcccggtacccagcttttgttccctttagtgagg
|||||||||||||||||||||||||||||||||||||||||||||||||||||
aggttctggagctcccccccgggccatgggtcgaaaacaagggaaatcactcc
1                                                  53




So since we are trying to insert our variable region construct at this restriction site the first and last inserted sequence should have homology arms complementary to sequences flanking this site.

Below would be the most I guess extreme version of this, where you are cloning in both promotors, intiation, termination and extention regions. Regions of the same solid color indicate homology.

![](files/insert.png)

But in each backbone, including `PFC8` there is already at least one usable promotor that could be taken advantage of. Additionally, it may not be best to clone in initiation and termiation regions at the same time since we are not 100% sure how long the R-loops that form will be. This is critical for termination region placement relative to the promotor because it needs to be far enough away where by the time R-loops arrive it would be reasonable for them to termininate but close enough so that R-loops are actually, at least on average, able to reach it.

That might look something more like the images below.

### Initiation construct

![](files/insert1.png)

The average length of R-loops formed using the initiation region would then inform termination region design.

### Termination construct

![](files/insert2.png)

## Basic initiation construct

Back to design via pyDNA for basic initiation region contruct

Read in pFC8 and linearize but cutting at `KpnI` site.

In [21]:
pFC8 = read('files/pFC8.gb')
pFC8_linear = pFC8.linearize(KpnI)

In [22]:
pFC8.list_features()

| Ft# | Label or Note | Dir | Sta  | End  | Len | type         | orf? |
|-----|---------------|-----|------|------|-----|--------------|------|
|   0 | L:T7\promoter | --> | 11   | 33   |  22 | promoter     |  no  |
|   1 | L:T7\+1\Site  | --> | 28   | 29   |   1 | misc_feature |  no  |
|   2 | L:SNRPN       | <-- | 51   | 1032 | 981 | CDS          |  no  |
|   3 | L:T3\promoter | <-- | 1046 | 1063 |  17 | promoter     |  no  |

In [23]:
pFC8_cut = Analysis(rb, pFC8.seq)
cuts = pFC8_cut.full()
cuts

{KpnI: [1033]}

Homology arms to the left and right of the cut site

**Note**

```
The position returned by the method search is the first base of the downstream segment produced by a restriction (i.e. the first base after the position where the enzyme will cut). The Restriction package follows biological convention (the first base of a sequence is base 1). 
```

Lets pretend cut is at 3 (output from method)
```
1 2 | 3 4 5 6 7 
```

In [24]:
cut_test = 3
seq_test = [1, 2, 3, 4, 5, 6, 7]
left, right = seq_test[:cut_test-1], seq_test[cut_test-1:]
print("left:", left)
print("right:", right)


left: [1, 2]
right: [3, 4, 5, 6, 7]


In [25]:
homology_length = 20
cut_site = cuts[KpnI][0]

In [26]:
h_left, h_right = pFC8.seq[cut_site-1-20:cut_site-1], pFC8.seq[cut_site-1:cut_site-1+20]
span = pFC8.seq[cut_site-1-20:cut_site-1+20]

assert len(h_left) == len(h_right)
assert h_left + h_right == span

print('left:', h_left)
print('right:', h_right)
print('span:', pFC8.seq[cut_site-1-20:cut_site-1+20])

left: CTCGAGGGGGGGCCCGGTAC
right: CCAGCTTTTGTTCCCTTTAG
span: CTCGAGGGGGGGCCCGGTACCCAGCTTTTGTTCCCTTTAG


Here, if pFC8 was cut with `KpnI` then the head and tail of the construct would need to have homology to the sequences above.

Genbank format version of example initiation and extension regions.

### Initiation region

Right arm added to start of sequence

```
LOCUS       INIT1                     330 bp ds-DNA     linear       07-FEB-2017
DEFINITION  .
ACCESSION   
VERSION     
SOURCE      .
  ORGANISM  .
COMMENT     
COMMENT     ApEinfo:methylated:1
ORIGIN
        1 CTCGAGGGGG GGCCCGGTAC TTCTGCTAAG AAAGGTGCTA CACTTTTCAA GACTAGATGT
       61 CTACAATGCC ACACCGTGGA AAAGGGTGGC CCACATAAGG TTGGTCCAAA CTTGCATGGT
      121 ATCTTTGGCA GACACTCTGG TCAAGCTGAA GGGTATTCGT ACACAGATGC CAATATCAAG
      181 AAAAACGTGT TGTGGGACGA AAATAACATG TCAGAGTACT TGACTAACCC AAAGAAATAT
      241 ATTCCTGGTA CCAAGATGGC CTTTGGTGGG TTGAAGAAGG AAAAAGACAG AAACGACTTA
      301 ATTACCTACT TGAAAAAAGC CTGTGAGTAA   
//
```

### Extention region

Really the same as the initiation region start copied last 20 bp of the initiation region and added the right homology arm to the end.

```
LOCUS       EXTEN1                     330 bp ds-DNA     linear       07-FEB-2017
DEFINITION  .
ACCESSION   
VERSION     
SOURCE      .
  ORGANISM  .
COMMENT     
COMMENT     ApEinfo:methylated:1
ORIGIN
        1 TGAAAAAAGC CTGTGAGTAA TTCTGCTAAG AAAGGTGCTA CACTTTTCAA GACTAGATGT
       61 CTACAATGCC ACACCGTGGA AAAGGGTGGC CCACATAAGG TTGGTCCAAA CTTGCATGGT
      121 ATCTTTGGCA GACACTCTGG TCAAGCTGAA GGGTATTCGT ACACAGATGC CAATATCAAG
      181 AAAAACGTGT TGTGGGACGA AAATAACATG TCAGAGTACT TGACTAACCC AAAGAAATAT
      241 ATTCCTGGTA CCAAGATGGC CTTTGGTGGG TTGAAGAAGG AAAAAGACAG AAACGACTTA
      301 ATTACCTACT CCAGCTTTTG TTCCCTTTAG   
//
```

**Note**

For the region that was inserted to be labeled and rendered as a colored block when viewed with SnapGene or similar program the genbank file should include a feature that described the inserted region.

Example feature added after running code is shown below.

```
FEATURES             Location/Qualifiers
     CDS             1..330
                     /label="- M13 ori +"
                     /label="-\M13\ori\+"
```

Adding features via pyRNA. Likely will need to do this in the pipeline.

Read in the new regions

In [27]:
init_path = 'files/test_init.gb'
exten_path = 'files/test_exten.gb'

init = read(init_path)
exten = read(exten_path)

init_amplicon = primer_design(init)
exten_amplicon = primer_design(exten)


In [28]:
exten.features

[]

In [29]:
init.add_feature(x=0, y=len(init), type='CDS', label='INIT-1')
exten.add_feature(x=0, y=len(exten), type='CDS', label='EXTEN-1')
print(init.features)
print(exten.features)

[SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(330), strand=1), type='CDS')]
[SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(330), strand=1), type='CDS')]


Adding features in this way does not seem to effect final render

In [30]:
init.write()

Actually maybe it does, output from write is below.

```
LOCUS       INIT1                    330 bp    DNA     linear   UNK 07-FEB-2017
DEFINITION  .
ACCESSION   INIT1
VERSION     INIT1
KEYWORDS    .
SOURCE      
  ORGANISM  .
            .
COMMENT     
            ApEinfo:methylated:1
FEATURES             Location/Qualifiers
     CDS             1..330
                     /label="hello"
ORIGIN
        1 ctcgaggggg ggcccggtac ttctgctaag aaaggtgcta cacttttcaa gactagatgt
       61 ctacaatgcc acaccgtgga aaagggtggc ccacataagg ttggtccaaa cttgcatggt
      121 atctttggca gacactctgg tcaagctgaa gggtattcgt acacagatgc caatatcaag
      181 aaaaacgtgt tgtgggacga aaataacatg tcagagtact tgactaaccc aaagaaatat
      241 attcctggta ccaagatggc ctttggtggg ttgaagaagg aaaaagacag aaacgactta
      301 attacctact tgaaaaaagc ctgtgagtaa
//
```

Yes looks like this will be a valid method for use in the workflow, rendered by snapgene

![](files/hello_snap.png)

## pFC8 fragment construction

I think this labels the amplicons?

In [31]:
pFC8_linear.locus = "pFC8"
init_amplicon.locus = "INIT_REGION_1"
exten_amplicon.locus = "EXTEN_REGION_1"

Make the fragment list, must be of type amplicon. Using design primers assumes the input is the template. This is not really want we want to do since we are building at least the variable regions from nothing can can have whatever we want in there. This may or may not be true for the extension regions. In the later case then normal PCR based assembly would be required. But either way need to check what `primer_design` actually produces.

In [32]:
init_amplicon.figure()
str(init_amplicon.seq)

'CTCGAGGGGGGGCCCGGTACTTCTGCTAAGAAAGGTGCTACACTTTTCAAGACTAGATGTCTACAATGCCACACCGTGGAAAAGGGTGGCCCACATAAGGTTGGTCCAAACTTGCATGGTATCTTTGGCAGACACTCTGGTCAAGCTGAAGGGTATTCGTACACAGATGCCAATATCAAGAAAAACGTGTTGTGGGACGAAAATAACATGTCAGAGTACTTGACTAACCCAAAGAAATATATTCCTGGTACCAAGATGGCCTTTGGTGGGTTGAAGAAGGAAAAAGACAGAAACGACTTAATTACCTACTTGAAAAAAGCCTGTGAGTAA'

In [33]:
init_amplicon.__dict__.keys()

dict_keys(['_seq', 'id', 'name', 'description', 'dbxrefs', 'annotations', '_per_letter_annotations', 'features', 'map_target', 'n', 'template', 'forward_primer', 'reverse_primer'])

Sequence is same as input

In [34]:
pFC8_frags = assembly_fragments(
    (pFC8_linear, init_amplicon, exten_amplicon, pFC8_linear)
)

In [35]:
pFC8_frags[1].figure()

                                   5CTCGAGGGGGGGC...AATTACCTACTTGAAAAAAGCCTGTGAGTAA3
                                                    |||||||||||||||||||||||||||||||
                                                   3TTAATGGATGAACTTTTTTCGGACACTCATTACTTTTTTCGGACACTCA5
5TCAGTACTCCAAGACCTCGAGGGGGGGCCCGGTACCTCGAGGGGGGGC3
                                    |||||||||||||
                                   3GAGCTCCCCCCCG...TTAATGGATGAACTTTTTTCGGACACTCATT5

ok...

In [36]:
pFC8_frags = pFC8_frags[:-1]
pFC8_frags

[Dseqrecord(-3593), Amplicon(383), Amplicon(383)]

In [37]:
asm_pFC8 = Assembly(pFC8_frags)
asm_pFC8

Assembly
fragments..: 3593bp 383bp 383bp
limit(bp)..: 25
G.nodes....: 8
algorithm..: common_sub_strings

In [38]:
canidate_pFC8 = asm_pFC8.assemble_circular()[0]
canidate_pFC8

Where are the 35 bp of homology coming from? Lets take a look at the result. 

In [39]:
canidate_pFC8.write("files/pFC8_assembly_test.gb")

See what the primers look like

In [40]:
from pydna.amplicon import Amplicon
amplicons1 = [x for x in pFC8_frags if isinstance(x, Amplicon)]

# Get forward and reverse primer for each Amplicon
primers1 = [(y.forward_primer, y.reverse_primer) for y in amplicons1]

In [41]:
# print primer pairs:
for pair in primers1:
    print(pair[0].format("fasta"))
    print(pair[1].format("fasta"))
    print()

>f330 INIT1
TCAGTACTCCAAGACCTCGAGGGGGGGCCCGGTACCTCGAGGGGGGGC

>r330 INIT1
ACTCACAGGCTTTTTTCATTACTCACAGGCTTTTTTCAAGTAGGTAATT


>f330 EXTEN1
AAAAAAGCCTGTGAGTAATGAAAAAAGCCTGTGAGT

>r330 EXTEN1
ATTAACCCTCACTAAAGGGAACAAAAGCTGGGTACCTAAAGGGAACAAAAGCT




So it looks like the regions we wanted to clone in definitely made it in there. The highlighted region below is the complete 330 bp of the `INIT1` region. However it is in the wrong location in reference to the promotor so would have to swap relative location of things.

![](files/pFC8_test_assemble.png)

And if we look at the sequence and search for what was the right homology arm `CTCGAGGGGGGGCCCGGTAC` that would have been included in the sequence we can see that it was not recognized as the homologous region and is now duplicated.

![](files/pFC8_test_assemble_dup.png)

So when used without any modification this approach will assume that you want primers for your sequences and not assume any kind of homology between sequences. 

So I guess one approach could be let the program think that primers need to be made for all sequeunces, and for those that are to be completely synthezied do not append any additional homology arms.
Then only create primers for regions that will actually require them and append homology arms on to the design sequences. This is because the purpose of the PCR step is to create the homologous region but in the case of the designed regions, namely the variable regions this will not be needed if we know the backbone and the location where each VR is to be cloned in before hand.

## Correcting orrientation

In [42]:
pFC8_frags_orr = assembly_fragments(
    (pFC8_linear, exten_amplicon, init_amplicon, pFC8_linear)
)

In [43]:
pFC8_frags_orr = pFC8_frags_orr [:-1]
pFC8_frags_orr

[Dseqrecord(-3593), Amplicon(383), Amplicon(383)]

In [44]:
asm_pFC8_orr = Assembly(pFC8_frags_orr)
asm_pFC8_orr

Assembly
fragments..: 3593bp 383bp 383bp
limit(bp)..: 25
G.nodes....: 8
algorithm..: common_sub_strings

In [45]:
canidate_pFC8_orr = asm_pFC8_orr.assemble_circular()[0]
canidate_pFC8_orr

In [46]:
canidate_pFC8_orr.write('files/pFC8_test.orr.gb')

Things are now in the correct order but the wrong orrientation relative to the T3 promotor. Still have some work to do and need to dig into exactly how orrientations are determined and positioned when cutting circular plasmid.

![](files/init-1-orr.png)

## Trying to correct orrientation

It might actually work if just label the features of each inserted region with the same orrientation as the promotor they will be transcribed by, in this case the T3

Default strand is 1 set to -1 for reverse strand?

In [47]:
init_path = 'files/test_init.gb'
exten_path = 'files/test_exten.gb'

init = read(init_path)
exten = read(exten_path)

init.add_feature(x=0, y=len(init), type='CDS', label='INIT-1', strand=-1)
exten.add_feature(x=0, y=len(exten), type='CDS', label='EXTEN-1', strand=-1)

init_amplicon = primer_design(init)
exten_amplicon = primer_design(exten)

Really should put in function by now

In [48]:
pFC8_frags_rev = assembly_fragments(
    (pFC8_linear, exten_amplicon, init_amplicon, pFC8_linear)
)
asm_pFC8_rev = Assembly(pFC8_frags_rev)
canidate_pFC8_rev = asm_pFC8_orr.assemble_circular()[0]
canidate_pFC8_rev

In [49]:
canidate_pFC8_rev.write('files/canidate_pFC8_rev.gb')

It may be that need to extract the feature and use that instead of just passing in the complete genback file object.

In [51]:
init.list_features()

| Ft# | Label or Note | Dir | Sta | End | Len | type | orf? |
|-----|---------------|-----|-----|-----|-----|------|------|
|   0 | L:I N I T - 1 | <-- | 0   | 330 | 330 | CDS  |  no  |

Extract feature and create amplicon from it

In [53]:
init_feat = init.extract_feature(0)
init_feat_amplicon = primer_design(init_feat)

Remake the assembly

In [88]:
pFC8_frags_rev_feat = assembly_fragments(
    (pFC8_linear, exten_amplicon, init_feat_amplicon, pFC8_linear)
)
asm_pFC8_rev_feat = Assembly(pFC8_frags_rev_feat)
canidate_pFC8_rev_feat = asm_pFC8_rev_feat.assemble_circular()[0]
asm_pFC8_rev_feat

Assembly
fragments..: 3593bp 383bp 383bp 3593bp
limit(bp)..: 25
G.nodes....: 8
algorithm..: common_sub_strings

In [89]:
canidate_pFC8_rev_feat.write('files/canidate_pFC8_rev_feat.gb')

That seemed to reverse the orrientation that we wanted for some reason, maybe bug earlier?

![](files/test_assembly_init_feat.png)

In [90]:
pFC8_frags_rev_feat = assembly_fragments(
    (pFC8_linear, exten_amplicon, init_amplicon, pFC8_linear)
)
asm_pFC8_rev_feat = Assembly(pFC8_frags_rev_feat)
canidate_pFC8_rev_feat = asm_pFC8_rev_feat.assemble_circular()[0]
canidate_pFC8_rev_feat.write('files/canidate_pFC8_og.gb')

Using original init amplicon

![](files/test_assembly_init_og.png)

Everything is looking good here direction wise, but need to check the sequence location is actually correct. `ctcgagggggggcccggtac` is the start of the init sequence so it should be found just downstream of the T3 promotor.

![](files/wrong_orrientation.png)

However, when we look at the sequence we see that it actually is located opposite the start of init relative to the promotor. This may be because while the feature called init-1 that is being rendered is `-` in orrientation the actual sequence is still `+` orrientation.

In [92]:
vars(pFC)

{'_seq': Dseq(-330)
 CTCG..GTAA
 GAGC..CATT,
 'id': 'INIT1',
 'name': 'INIT1',
 'description': '',
 'dbxrefs': [],
 'annotations': {'molecule_type': 'DNA',
  'data_file_division': 'linear',
  'date': '07-FEB-2017',
  'source': '',
  'organism': '.',
  'taxonomy': [],
  'comment': '\nApEinfo:methylated:1'},
 '_per_letter_annotations': {},
 'features': [SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(330), strand=-1), type='CDS')],
 'map_target': None,
 'n': 5e-14,
 'path': 'files/test_init.gb'}

[SeqFeature Documentation](https://biopython.org/docs/1.75/api/Bio.SeqFeature.html)

In [126]:
init.seq[:10]

Dseq(-10)
CTCGAGGGGG
GAGCTCCCCC

In [111]:
init.list_features()

| Ft# | Label or Note | Dir | Sta | End | Len | type | orf? |
|-----|---------------|-----|-----|-----|-----|------|------|
|   0 | L:I N I T - 1 | <-- | 0   | 330 | 330 | CDS  |  no  |

## Definitive guide to insert orrientation relative to extant promotors

Lets go in with the perspective that we have a promotor we want to use already existing in the plasmid. We can then read the gp file to determine its orrientation. If we are inserting an initiation sequence we want to be as close to the promotor as possible but also in the correct orrientation. 

Get the feature and extract the sequence from it

In [125]:
init.extract_feature(0).features[0].extract(init).seq.reverse_complement()[0:10]

Dseq(-10)
TTACTCACAG
AATGAGTGTC

## Orrienting sequences in the pipeline

In the actual pipeline want to be able to orrient sequences relative to a specific promotor automatically, if that promotor is already in the sequence then want to extract its orrientation and adjust the insert feature to match.

In [60]:
pFC8.list_features()

| Ft# | Label or Note | Dir | Sta  | End  | Len | type         | orf? |
|-----|---------------|-----|------|------|-----|--------------|------|
|   0 | L:T7\promoter | --> | 11   | 33   |  22 | promoter     |  no  |
|   1 | L:T7\+1\Site  | --> | 28   | 29   |   1 | misc_feature |  no  |
|   2 | L:SNRPN       | <-- | 51   | 1032 | 981 | CDS          |  no  |
|   3 | L:T3\promoter | <-- | 1046 | 1063 |  17 | promoter     |  no  |

In [66]:
t3_promotor = pFC8.extract_feature(0)
t3_promotor.__dict__.keys()

dict_keys(['_seq', 'id', 'name', 'description', 'dbxrefs', 'annotations', '_per_letter_annotations', 'features', 'map_target', 'n', 'path'])

In [75]:
type(t3_promotor)

pydna.dseqrecord.Dseqrecord

In [82]:
t3_promotor.features

[SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(22), strand=1), type='promoter'),
 SeqFeature(FeatureLocation(ExactPosition(17), ExactPosition(18), strand=1), type='misc_feature')]

In [93]:
def dump_file(path):
    return open(path).read()
print(dump_file(init_path))

LOCUS       INIT1                     330 bp ds-DNA     linear       07-FEB-2017
DEFINITION  .
ACCESSION   
VERSION     
SOURCE      .
  ORGANISM  .
COMMENT     
COMMENT     ApEinfo:methylated:1
ORIGIN
        1 CTCGAGGGGG GGCCCGGTAC TTCTGCTAAG AAAGGTGCTA CACTTTTCAA GACTAGATGT
       61 CTACAATGCC ACACCGTGGA AAAGGGTGGC CCACATAAGG TTGGTCCAAA CTTGCATGGT
      121 ATCTTTGGCA GACACTCTGG TCAAGCTGAA GGGTATTCGT ACACAGATGC CAATATCAAG
      181 AAAAACGTGT TGTGGGACGA AAATAACATG TCAGAGTACT TGACTAACCC AAAGAAATAT
      241 ATTCCTGGTA CCAAGATGGC CTTTGGTGGG TTGAAGAAGG AAAAAGACAG AAACGACTTA
      301 ATTACCTACT TGAAAAAAGC CTGTGAGTAA   
//
