# Insert homology arms

Each insert contains a 5' and 3' homology arm which include a KpnI and EcoRI recognition site respectively. These sequnces come directly from pFC8 and in this notebook I am creating genbank files of the desired regions and checking to make sure they have the features required.

In [1]:
import os
from pydna.readers import read
from pydna.design import primer_design
from pydna.design import assembly_fragments
from pydna.dseqrecord import Dseqrecord
from pydna.assembly import Assembly
from Bio.Restriction import *
from pydna.amplicon import Amplicon
from Bio.Seq import Seq
from Bio import SeqIO
import yaml
import pprint

In [2]:
pFC8_path = '../resources/files/genbank/pFC8.gb'
pFC8 = read(pFC8_path)

In [3]:
arm_length = 30

## 5' arm

The 5' arm is composed of the 30 nucleotides upstream of the KnpI recognition site.

First make sure KpnI is in fact a unique cutter.

In [4]:
cut_location = KpnI.search(pFC8.seq)
assert len(cut_location) == 1
print(cut_location)

[1033]


Locate the recognition site in the sequence. This gives the start coordinate of the recognition site.

In [5]:
kpnI_start = pFC8.seq.find(KpnI.site)
kpnI_start

1027

Get the end position. Check to make sure the sequence at the start and end coordinates match the recognition site sequence.

In [6]:
kpnI_end = kpnI_start + len(KpnI.site)
assert str(pFC8.seq[kpnI_start:kpnI_end]) == str(KpnI.site)

Get the homology arm sequence.

In [7]:
five_arm_start = kpnI_end-arm_length
five_arm_end = kpnI_end

In [8]:
five_arm = pFC8.seq[five_arm_start:five_arm_end]
assert len(five_arm) == arm_length
five_arm

Dseq(-30)
CTCCAAGACCTCGAGGGGGGGCCCGGTACC
GAGGTTCTGGAGCTCCCCCCCGGGCCATGG

Double check to make sure last 6 nucleotides are the KpnI site.

In [9]:
assert str(five_arm[-6:]) == str(KpnI.site)
print('These must match')
print('Last 6 5 arm:', five_arm[-6:])
print('KnpI site:', KpnI.site)

These must match
Last 6 5 arm: GGTACC
KnpI site: GGTACC


Show the digestion.

In [10]:
five_arm.cut(KpnI)

(Dseq(-29)
 CTCCAAGACCTCGAGGGGGGGCCCGGTAC
 GAGGTTCTGGAGCTCCCCCCCGGGC,
 Dseq(-5)
     C
 CATGG)

## 3' arm

The 3' arm is composed of the EcoRI recognition site and 24 downstream nucleotides.

Again find the location of the EcoRI site in pFC8 and confirm it is unique.

In [11]:
eco_cut_location = EcoRI.search(pFC8.seq)
assert len(eco_cut_location) == 1
print(eco_cut_location)

[540]


Get start and end positions of the recognition site.

In [12]:
EcoRI_start = pFC8.seq.find(EcoRI.site)
EcoRI_start

538

In [13]:
EcoRI_end = EcoRI_start + len(EcoRI.site)

In [14]:
assert str(pFC8.seq[EcoRI_start: EcoRI_end]) == EcoRI.site

Get nucleotides **downstream** of the EcoRI recognition site to complete the homology arm.

In [15]:
three_arm_start = EcoRI_start
three_arm_end = EcoRI_start + arm_length
three_arm = pFC8.seq[three_arm_start:three_arm_end]
assert len(three_arm) == arm_length
three_arm

Dseq(-30)
GAATTCCCCCCCCAGTCGCCCCACGTACCC
CTTAAGGGGGGGGTCAGCGGGGTGCATGGG

Check that 3' arm begins with the recognition sequence.

In [16]:
assert three_arm.find(EcoRI.site) == 0

Show digestion

In [17]:
three_arm.cut(EcoRI)

(Dseq(-5)
 G
 CTTAA,
 Dseq(-29)
 AATTCCCCCCCCAGTCGCCCCACGTACCC
     GGGGGGGGTCAGCGGGGTGCATGGG)

## Write output files

Write genbank formated files for both homology arms.

In [50]:
from pydna.genbankrecord import GenbankRecord

In [58]:
def write_seq_as_genbank(record, name, output_path, **kwargs):
    g_record = GenbankRecord(record)
    g_record.locus=name.replace(' ', '_')
    g_record.
    g_record.stamp()
    g_record.add_feature(0, len(record), name=name, **kwargs)
    g_record.write(output_path)

Write genbank files for 5' and 3' arms.

In [22]:
author = 'Ethan Holleman'
note_5 = f'Homology arm of variable region insert taken from (base 1) positions {five_arm_start+1}-{five_arm_end+1} of pFC8'
note_3 = f'Homology arm of variable region insert taken from (base 1) positions {three_arm_start+1}-{three_arm_end+1} of pFC8'

In [60]:
write_seq_as_genbank(
    five_arm, 
    "5' homology arm", 
    '../resources/files/genbank/5_prime_homology_arm.gb', 
    author=author,
    label='5_prime_HR',
    note=note_5,
)

In [61]:
write_seq_as_genbank(
    three_arm, 
    "3' homology arm", 
    '../resources/files/genbank/3_prime_homology_arm.gb', 
    author=author,
    label='3_prime_HR',
    note=note_3,
)