# Insert homology arms

Each insert contains a 5' and 3' homology arm which include a KpnI and EcoRI recognition site respectively. These sequnces come directly from pFC9 and in this notebook I am creating genbank files of the desired regions and checking to make sure they have the features required.

In [1]:
import os
from pydna.readers import read
from pydna.design import primer_design
from pydna.assembly import Assembly
from pydna.design import assembly_fragments
from pydna.dseqrecord import Dseqrecord
from Bio.Restriction import *
from pydna.amplicon import Amplicon
from Bio.Seq import Seq
from Bio import SeqIO
import yaml
import pprint

In [2]:
pFC9_path = '../resources/files/genbank/pFC9.gb'
pFC9 = read(pFC9_path)

In [3]:
arm_length = 30

## 5' arm

The 5' arm is composed of the 30 nucleotides upstream of the KnpI recognition site.

First make sure KpnI is in fact a unique cutter.

In [4]:
five_arm_cutter = SacI
three_arm_cutter = EcoRI

In [5]:
cut_location = five_arm_cutter.search(pFC9.seq)
assert len(cut_location) == 1
print(cut_location)

[44]


Locate the recognition site in the sequence. This gives the start coordinate of the recognition site.

In [6]:
kpnI_start = pFC9.seq.find(five_arm_cutter.site)
kpnI_start

38

Get the end position. Check to make sure the sequence at the start and end coordinates match the recognition site sequence.

In [7]:
kpnI_end = kpnI_start + len(five_arm_cutter.site)
assert str(pFC9.seq[kpnI_start:kpnI_end]) == str(five_arm_cutter.site)

Get the homology arm sequence. Do not include the KpnI site.

In [8]:
five_arm_start = kpnI_end-arm_length
five_arm_end = kpnI_end

In [9]:
five_arm_no_kpnI = pFC9.seq[five_arm_start:five_arm_end]

Append the KpnI recognition site to the end of the 5' arm.

In [10]:
five_arm = five_arm_no_kpnI + Dseqrecord(KpnI.site).seq
five_arm.cut(KpnI)

(Dseq(-35)
 TACG..CTCGGTAC
 ATGC..GAGC    ,
 Dseq(-5)
     C
 CATGG)

Double check to make sure last 6 nucleotides are the KpnI site.

In [11]:
assert str(five_arm[-6:]) == str(KpnI.site)
print('These must match')
print('Last 6 5 arm:', five_arm[-6:])
print('KnpI site:', KpnI.site)

These must match
Last 6 5 arm: GGTACC
KnpI site: GGTACC


Show the digestion.

In [12]:
five_arm.cut(KpnI)

(Dseq(-35)
 TACG..CTCGGTAC
 ATGC..GAGC    ,
 Dseq(-5)
     C
 CATGG)

In [13]:
len(five_arm)

36

## 3' arm

The 3' arm is composed of the EcoRI recognition site and 24 downstream nucleotides.

Again find the location of the EcoRI site in pFC9 and confirm it is unique.

In [14]:
eco_cut_location = three_arm_cutter.search(pFC9.seq)
assert len(eco_cut_location) == 1
print(eco_cut_location)

[534]


Get start and end positions of the recognition site.

In [15]:
EcoRI_start = pFC9.seq.find(EcoRI.site)
EcoRI_start

532

In [16]:
EcoRI_end = EcoRI_start + len(EcoRI.site)

In [17]:
assert str(pFC9.seq[EcoRI_start: EcoRI_end]) == EcoRI.site

Get nucleotides **downstream** of the EcoRI recognition site to complete the homology arm.

In [18]:
three_arm_start = EcoRI_start
three_arm_end = EcoRI_start + arm_length
three_arm = pFC9.seq[three_arm_start:three_arm_end]
assert len(three_arm) == arm_length
three_arm

Dseq(-30)
GAATTCGTCGCAGTGACCGAGGCGAGGAGG
CTTAAGCAGCGTCACTGGCTCCGCTCCTCC

Check that 3' arm begins with the recognition sequence.

In [19]:
assert three_arm.find(EcoRI.site) == 0

Show digestion

In [20]:
three_arm.cut(EcoRI)

(Dseq(-5)
 G
 CTTAA,
 Dseq(-29)
 AATTCGTCGCAGTGACCGAGGCGAGGAGG
     GCAGCGTCACTGGCTCCGCTCCTCC)

## Write output files

Write genbank formated files for both homology arms.

In [21]:
from pydna.genbankrecord import GenbankRecord
import datetime

In [22]:
def write_seq_as_genbank(record, name, output_path, version=1.0, **kwargs):
    g_record = GenbankRecord(record)
    g_record.locus=name.replace(' ', '_')
    g_record.id = f'v{version}'
    g_record.stamp()
    g_record.add_feature(0, len(record), name=name, **kwargs)
    g_record.write(output_path)

Write genbank files for 5' and 3' arms.

In [23]:
author = 'Ethan Holleman'
note_5 = f'Homology arm of variable region insert taken from (base 1) positions {five_arm_start+1}-{five_arm_end+1} of pFC9'
note_3 = f'Homology arm of variable region insert taken from (base 1) positions {three_arm_start+1}-{three_arm_end+1} of pFC9'

In [24]:
mod_date = datetime.date.strftime(datetime.datetime.now(), "%m/%d/%Y")

In [25]:
write_seq_as_genbank(
    five_arm, 
    "5' homology arm", 
    '../resources/files/genbank/5_prime_homology_arm.gb', 
    author=author,
    label='5_prime_HR',
    note=note_5,
    pFC9_start=five_arm_start,
    pFC9_end=five_arm_end,
    pFC9_seguid=pFC9.seq.seguid(),
    date=mod_date
)

In [26]:
write_seq_as_genbank(
    three_arm, 
    "3' homology arm", 
    '../resources/files/genbank/3_prime_homology_arm.gb', 
    author=author,
    label='3_prime_HR',
    note=note_3,
    pFC9_start=three_arm_start,
    pFC9_end=three_arm_end,
    pFC9_seguid=pFC9.seq.seguid(),
    date=mod_date
)

## Testing insert

Testing insertion into initiation construct.

Cut pFC9 with KpnI and EcoRI.

In [32]:
pFC9_linear = pFC9.cut((five_arm_cutter, three_arm_cutter))
pFC9_lf = max(pFC9_linear, key=lambda x: len(x))
pFC9_lf

Dseqrecord(-3099)

Simulate an insert.

In [33]:
def simulate_t5(seq):
    # arbitrary say degrades 35 nucleotides
    seq.watson = seq.watson[35:]
    seq.crick = seq.crick[:-35]
    return seq

In [34]:
insert = read('../resources/files/genbank/5_prime_homology_arm.gb') + read('../resources/files/genbank/3_prime_homology_arm.gb')
insert

Dseqrecord(-66)

In [39]:
pFC9_lf = Dseqrecord(pFC9_lf)
pFC9_lf.name = 'pFC9'
insert = Dseqrecord(insert)
insert.name = 'insert'
fragments = [insert, pFC9_lf]
a = Assembly(fragments)
assemble = a.assemble_circular()[0]

In [40]:
assemble.cut(KpnI)

(Dseqrecord(-2610), Dseqrecord(-505))