# Tac series primers

Both tac initiation and termination series constructs will depend on amplification of insert regions from complete T7 initiation and termination constructs. Here I design primers to use for Gibson assembly for both Tac initiation and termination constructs. This is a version of the notebook from the `notes` directory adapted for use with snakemake.

Process snakemake inputs.

In [1]:
from pathlib import Path

def iter_dir(dir_path):
    # return all files in top level of directory as a list
    return list(Path(str(dir_path)).iterdir())

In [39]:
tac_backbone = str(snakemake.input['tac_backbone'])

t7_init_files = iter_dir(snakemake.input['t7_init'])
t7_term_files = iter_dir(snakemake.input['t7_term'])

t7_init_constructs_dir = str(snakemake.input['t7_init'])
t7_term_constructs_dir = str(snakemake.input['t7_term'])

Transfer values over to variables from original notebook.

In [12]:
t7_init_construct_path = str(t7_init_files[0])
t7_term_construct_path = str(t7_term_files[0])
pFC53T1T1_path = str(tac_backbone)

In [4]:
import numpy as np
import os
from pathlib import Path
from Bio.Restriction import *
from pydna.genbankrecord import GenbankRecord
from pydna.readers import read
from pydna.dseqrecord import Dseqrecord
from pydna.design import primer_design
from pydna.amplify import pcr
from pydna.assembly import Assembly
from pydna.design import assembly_fragments

## Helper functions

In [5]:
def get_feature_by_name(record, feature_name):
    feature_name_dict = {
        record.extract_feature(i).name: record.extract_feature(i) 
        for i in range(len(record.features))
    }
    if feature_name in feature_name_dict:
        return feature_name_dict[feature_name]
    else:
        return -1

In [6]:
def find_unique_cut_site(record, enzyme):
    assert enzyme in record.once_cutters()
    return record.seq.find(enzyme.site)

In [7]:
def extend_primer_until_g(source_record, start, end, rc=False):
    
    pass

In [8]:
homology_length = 20

## T7 initiation series primers

This series requires a primer that binds to the anchor sequences and shares homology with the HindIII digested site + T1T2 terminators and a primer that binds to the 5' end of the placeholder initiator and the KpnI digestion + Tac promoter sequence.

'.'

In [13]:
t7_init = GenbankRecord(read(t7_init_construct_path))
pFC53t1t2 = GenbankRecord(read(pFC53T1T1_path))
t7_init.list_features()

| Ft# | Label or Note    | Dir | Sta  | End  | Len | type         | orf? |
|-----|------------------|-----|------|------|-----|--------------|------|
|   0 | L:T3 promoter T3 | <-- | 513  | 530  |  17 | promoter     |  no  |
|   1 | L:T7 promoter T7 | --> | 3067 | 3089 |  22 | promoter     |  no  |
|   2 | L:T7 +1 Site T7\ | --> | 3084 | 3085 |   1 | misc_feature |  no  |
|   3 | L:5_prime_HR     | --> | 3070 | 3106 |  36 | misc         |  no  |
|   4 | L:Anchor region  | --> | 3106 | 3121 |  15 | CDS          |  no  |
|   5 | L:Variable regio | --> | 3121 | 3321 | 200 | CDS          |  no  |
|   6 | L:3_prime_HR     | --> | 0    | 3322 |  30 | misc         |  no  |

In [14]:
pFC53t1t2.list_features()

| Ft# | Label or Note    | Dir | Sta  | End  |  Len | type         | orf? |
|-----|------------------|-----|------|------|------|--------------|------|
|   0 | L:Airn           | <-- | 378  | 1765 | 1387 | CDS          |  no  |
|   1 | L:Amp            | <-- | 2804 | 3637 |  833 | CDS          |  no  |
|   2 | L:ApaLI          | --> | 2295 | 2301 |    6 | misc_feature |  no  |
|   3 | L:ApaLI          | --> | 3541 | 3547 |    6 | misc_feature |  no  |
|   4 | L:Repeat         | --> | 985  | 1058 |   73 | repeat_unit  |  no  |
|   5 | L:Repeat\2       | --> | 1066 | 1140 |   74 | repeat_unit  |  no  |
|   6 | L:tac\promoter   | <-- | 1765 | 1794 |   29 | promoter     |  no  |
|   7 | L:T1T2\terminato | <-- | 6    | 378  |  372 | terminator   |  no  |

In [15]:
anchor_region = get_feature_by_name(t7_init, 'Anchor_region')
anchor_region

Dseqrecord(-15)

HinIII cut site is adjacent to the T1T2 terminators.

In [16]:
tac_promoter_homology_start = pFC53t1t2.features[6].location.start - (len(KpnI.site))
tac_promoter_homology_end =  pFC53t1t2.features[6].location.start + homology_length
tac_promoter_homology = Dseqrecord(pFC53t1t2.seq[tac_promoter_homology_start:tac_promoter_homology_end])
tac_promoter_homology.seq

Dseq(-26)
GGTACCCATTATACGAGCCGATGATT
CCATGGGTAATATGCTCGGCTACTAA

Take reverse complement of tac promoter and add to the anchor region to create the forward primer.

In [17]:
t7_init_tac_primer = tac_promoter_homology.reverse_complement() + anchor_region
print(t7_init_tac_primer.seq)

AATCATCGGCTCGTATAATGGGTACCCAAACACTCCCTCGG


The second primer will bound an extension region and share homology to the T1T2 terminators.

In [18]:
extension_region_length = 300

In [19]:
three_prime_arm = t7_init.features[6]
start = three_prime_arm.location.start + len(three_prime_arm)
end = start + extension_region_length
extension_region = t7_init.seq[start:end]

In [20]:
# last 20 nucleotides
extension_binding_site = extension_region[-homology_length:].reverse_complement()
extension_binding_site

Dseq(-20)
CTGATGCCCCCTCCTCTACA
GACTACGGGGGAGGAGATGT

Define homology target for h1t2 terminators.

In [21]:
t1t2_homology_start = pFC53t1t2.features[7].location.end - homology_length
t1t2_homology_end = pFC53t1t2.features[7].location.end + (len(HindIII.site))

t1t2_homology = pFC53t1t2[t1t2_homology_start:t1t2_homology_end]
t1t2_homology.seq

Dseq(-26)
TCGTTTTATTTGATGCCTGGAAGCTT
AGCAAAATAAACTACGGACCTTCGAA

In [22]:
t7_init_t1t2_primer = t1t2_homology.seq + extension_binding_site 
print(t7_init_t1t2_primer)

TCGTTTTATTTGATGCCTGGAAGCTTCTGATGCCCCCTCCTCTACA


## T7 termination series primers

In [23]:
t7_term = GenbankRecord(read(t7_term_construct_path))
t7_term.list_features()

| Ft# | Label or Note    | Dir | Sta  | End  | Len | type         | orf? |
|-----|------------------|-----|------|------|-----|--------------|------|
|   0 | L:T3\promoter    | <-- | 14   | 31   |  17 | promoter     |  no  |
|   1 | L:T7\promoter    | --> | 2568 | 2590 |  22 | promoter     |  no  |
|   2 | L:T7\+1\Site     | --> | 2585 | 2586 |   1 | misc_feature |  no  |
|   3 | L:Placeholder st | --> | 2613 | 2813 | 200 | misc         |  no  |
|   4 | L:Variable regio | <-- | 2823 | 3023 | 200 | CDS          |  no  |
|   5 | L:Anchor region  | <-- | 3023 | 3038 |  15 | CDS          |  no  |

In this series primer with homology to the anchor region targets the t1t2 terminators to place the variable region adjacent to the terminators.

The anchor region should be exactly the same sequence.

In [24]:
anchor_region_term = get_feature_by_name(t7_term, 'Anchor_region')
assert anchor_region_term.seq == anchor_region.seq

In [25]:
t7_term_t1t2_primer =  t1t2_homology + anchor_region_term
print(t7_term_t1t2_primer.seq)

TCGTTTTATTTGATGCCTGGAAGCTTCAAACACTCCCTCGG


The second primer binds the start of the string initiator which is currently a placeholder so this primer should *not* be ordered until this is updated.

In [27]:
strong_init_target_start = t7_term.features[3].location.start
strong_init_target_end = strong_init_target_start + homology_length
strong_init_target = t7_term.seq[strong_init_target_start:strong_init_target_end]
strong_init_target

Dseq(-20)
GCTTTACTCTCATAAAGAGC
CGAAATGAGAGTATTTCTCG

In [28]:
t7_term_tac_primer =  tac_promoter_homology.reverse_complement() + strong_init_target
t7_term_tac_primer.seq

Dseq(-46)
AATC..GAGC
TTAG..CTCG

Collect all primers and target constructs into one location.

In [29]:
primers = [
    (t7_init_tac_primer, t7_init_t1t2_primer, t7_init),
    (t7_term_tac_primer, t7_term_t1t2_primer, t7_term)
]

## Verify amplification

Verify init amplicon contents.

In [30]:
init_product = pcr(*primers[0])
term_product = pcr(*primers[1])

In [31]:
assert init_product.seq.find(t7_init.extract_feature(4).seq)
# check to make sure extension region is present
assert init_product.seq.find(extension_region)
print('Init amplicon passes all tests')

Init amplicon passes all tests


Do the same for the termination amplicon.

In [32]:
# should contain the reverse complement of the variable region
assert term_product.seq.find(t7_init.extract_feature(4).seq.reverse_complement())
# should also contain the forward string initiator sequence
assert term_product.seq.find(t7_term.extract_feature(3).seq)
print('Term amplicon passes all tests')

Term amplicon passes all tests


## Verify assembly

Digest pFC53 and select the large fragment.

In [33]:
pFC53_large_fragment = max(pFC53t1t2.cut((KpnI, HindIII)), key=lambda f: len(f))
pFC53_large_fragment

Dseqrecord(-2943)

Assemble fragments into final construct.

In [34]:
init_assembly = Assembly([init_product, pFC53_large_fragment], limit=20).assemble_circular()[0]
init_assembly

In [35]:
init_assembly.list_features()

| Ft# | Label or Note    | Dir | Sta  | End  | Len | type         | orf? |
|-----|------------------|-----|------|------|-----|--------------|------|
|   0 | L:name           | --> | 20   | 41   |  21 | primer_bind  |  no  |
|   1 | L:Anchor region  | --> | 26   | 41   |  15 | CDS          |  no  |
|   2 | L:Variable regio | --> | 41   | 241  | 200 | CDS          |  no  |
|   3 | L:3_prime_HR     | --> | 241  | 271  |  30 | misc         |  no  |
|   4 | L:name           | <-- | 552  | 575  |  23 | primer_bind  |  no  |
|   5 | L:T1T2\terminato | --> | 582  | 954  | 372 | terminator   |  no  |
|   6 | L:Amp            | --> | 1643 | 2476 | 833 | CDS          |  no  |
|   7 | L:ApaLI          | <-- | 1733 | 1739 |   6 | misc_feature |  no  |
|   8 | L:ApaLI          | <-- | 2979 | 2985 |   6 | misc_feature |  no  |
|   9 | L:tac\promoter   | --- | 0    | 3491 |  29 | promoter     |  no  |

In [36]:
term_assembly = Assembly([term_product, pFC53_large_fragment], limit=20).assemble_circular()[0]
term_assembly

In [37]:
term_assembly.list_features()

| Ft# | Label or Note    | Dir | Sta  | End  | Len | type         | orf? |
|-----|------------------|-----|------|------|-----|--------------|------|
|   0 | L:Placeholder st | --> | 26   | 226  | 200 | misc         |  no  |
|   1 | L:name           | --> | 26   | 46   |  20 | primer_bind  |  no  |
|   2 | L:Variable regio | <-- | 236  | 436  | 200 | CDS          |  no  |
|   3 | L:Anchor region  | <-- | 436  | 451  |  15 | CDS          |  no  |
|   4 | L:name           | <-- | 436  | 451  |  15 | primer_bind  |  no  |
|   5 | L:T1T2\terminato | --> | 461  | 833  | 372 | terminator   |  no  |
|   6 | L:Amp            | --> | 1522 | 2355 | 833 | CDS          |  no  |
|   7 | L:ApaLI          | <-- | 1612 | 1618 |   6 | misc_feature |  no  |
|   8 | L:ApaLI          | <-- | 2858 | 2864 |   6 | misc_feature |  no  |
|   9 | L:tac\promoter   | --- | 0    | 3370 |  29 | promoter     |  no  |

This confirms that these primers work for the inserts tested (insert 1), also need to check that these primers will be successful for all other inserts. Here I am using files produced by the snakemake pipeline. If they do not exist these checks will not run / fail.

Replace below directory paths with those from snakemake input.

In [42]:
def test_ensemble(construct_dir, primers):
    cd = Path(construct_dir)
    if cd.is_dir():
        for each_construct in cd.iterdir():
            if each_construct.suffix == '.gb':
                print(each_construct)
                template = GenbankRecord(read(str(each_construct)))
                amplicon = pcr(*primers, template)
                assert amplicon
                construct = Assembly(
                    [term_product, pFC53_large_fragment], 
                    limit=20
                ).assemble_circular()[0]
                assert construct
                print(f'{each_construct.name} passed')
    else:
        print(f'{construct_dir} does not exist')

In [43]:
if os.path.exists(t7_init_constructs_dir):
    test_ensemble(t7_init_constructs_dir, primers[0][0:2])
if os.path.exists(t7_term_constructs_dir):
    test_ensemble(t7_term_constructs_dir, primers[1][0:2])

../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-4.gb
T7_init_VR-4.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-24.gb
T7_init_VR-24.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-29.gb
T7_init_VR-29.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-30.gb
T7_init_VR-30.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-10.gb
T7_init_VR-10.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-3.gb
T7_init_VR-3.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-19.gb
T7_init_VR-19.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-9.gb
T7_init_VR-9.gb passed
../output/insert_sequences_100_v2/constructs/T7_initiation_series/T7_init_VR-1.gb
T7_init_VR-1.gb passed
../output/insert_sequences_100_v2/constructs/

## Write primers

In [44]:
tac_init_primer_path = str(snakemake.output['init_primers'])
tac_term_primer_path = str(snakemake.output['term_primers'])

Path(tac_init_primer_path).parent.mkdir(parents=True, exist_ok=True)

NameError: name 'snakemake' is not defined

In [35]:
def label_primer_pair(primers, series):
    # clean up primer labels and mark with hash
    primers = list(primers)
    primers[0] = Dseqrecord(primers[0])
    primers[1] = Dseqrecord(primers[1])
    primers[0].description = ''
    primers[1].description = ''
    primers[0].id = f'{series}_fwd_primer'
    primers[1].id = f'{series}_rev_primer'
    primers[0].stamp()
    primers[1].stamp()
    
    return primers

In [36]:
def write_primers(primers, output_path):
    content = ''
    with open(str(output_path), 'w') as handle:
        for primer in primers:
            fa = primer.format('fasta') + '\n'
            handle.write(fa)
            content += fa
    return content

In [37]:
init_primers = label_primer_pair(primers[0][:2], 'tac_init')
term_primers = label_primer_pair(primers[1][:2], 'tac_term')
print(init_primers[0].id)
print(init_primers[1].id)

tac_init_fwd_primer
tac_init_rev_primer


In [38]:
print(write_primers(init_primers, tac_init_primer_path))

>tac_init_fwd_primer SEGUID_StueYkDsp0g9rG0FYTShzQ1ores
AATCATCGGCTCGTATAATGGGTACCCACGTTTGGCCACCA
>tac_init_rev_primer SEGUID_mP_kGr30Nx0VjMatTGUSzM1p7V0
TCGTTTTATTTGATGCCTGGAAGCTTGTGCACACAGCCCAGCTTGG



In [39]:
print(write_primers(term_primers, tac_term_primer_path))

>tac_term_fwd_primer SEGUID_84t5SYM-4gvSXkhbX9iZic_SKLc
AATCATCGGCTCGTATAATGGGTACCGCTTTGCGGAGCGAGGACCA
>tac_term_rev_primer SEGUID_qyadq-psR4Sb0RfEY0Kwz98EmVU
TCGTTTTATTTGATGCCTGGAAGCTTCACGTTTGGCCACCA

