# Oligo Tool

## How it works:

Retrons are genetically encoded RNA/ssDNA sequences targetable to edit specified regions of homologous DNA when their reverse transcriptase and a set of recombineering proteins are expressed. This allows you to  to program mutations, insertions, and deletions at your site of interest.


This is a tool that allows you to generate libraries of oligos that can be inserted into a retron vector via golden gate cloning. Although this was made with a specific application (Phage Assisted Evolution) and vector (RcP9) in mind to program mutations on M13 Phage, this hypothetically could be extended to generating retron libraries of any sort.


## Usage:

The tool is split into three main components:  

### Barcode Generator:
- **barcode_number**: Fetches a bioorthogonal barcode set (doi.org/10.1073/pnas.0812506106) from 3000 pairs, according to the input number. This allows the user to tag a generated library set. Forward and Reverse Primers for library PCR amplification of that pair are automatically created as downloadable fasta files in the file folder.

### Systematic Screen
The systematic screen enables the generation of a library with a predefined set of changes spread across a region of interest. Changes can include indels.

**Parameters:**
- **experiment_name**: Name of the experiment. Determines also the name of the file that is saved.  
- **sequence**: Target sequence in which changes are introduced.
  
  *Note: Extra bases need to be added to the beginning and end of the target to maintain retron homology arms
  (IE: Retron of 90bp likely requires 45bp addition at the beginning to target the first base, or else the retron would be unable to bind to a homologous sequence of interest).*
- **barcode**: Sequences added to the beginning and end of the oligo for library tagging and final PCR amplification.
- **goldengate_site**: Recognition Sequence of the Type IIS restriction enzyme for golden gate cloning.
- **overhang**: Golden Gate Cloning Sticky ends generated by the Type IIS enzyme.
- **start_position**: Start position at which the screen will start.
- **window_size**: Determines the nucleotides that are replaced with sequences from the pool.
If the window size is equivalent to the extension size, a mutation occurs.
A larger window size than the extension size would lead to deletions and/or mutations depending on the extension.
A smaller window size would lead to insertions and/or mutations depending on the extension.
- **step_size**: Defines how much the window is moved before the extensions are applied again. Defaults to three to match a codon.
- **end_position**: End position at which the screen will end.
- **retron_size**: Target size of the final retron.
- **extension**: Defines the pool of bases that are systematically inserted in the specified window.


### Linker Screen
The linker screen enables the user to rapidly generate constructs with varying linker types defined in the linker pool at a given position.

**Parameters:**
- **experiment_name**: Name of the experiment. Determines also the name of the file that is saved.  
- **sequence**: Target nucleotide sequence in which changes are introduced.
- **barcode**: Nucleotide sequences added to the beginning and end of the oligo for library tagging and final PCR amplification.
- **goldengate_site**: Recognition Sequence of the Type IIS restriction enzyme for golden gate cloning.
- **overhang**: Golden Gate Cloning Sticky ends generated by the Type IIS enzyme.
- **target_position**: Position at which the linker pool will be applied.
- **retron_size**: Target size of the final retron.
- **linker_pool**: Amino acid sequence of comma seperated linker sequences. '-' and '+' mark deletions up or downstream of the target site. A 'X' will generate every possible amino acid at a given position.
- **species**: Automatically adjust codons to be optimal for e.coli




In [None]:
# @title
import itertools as it
from itertools import product
import re
import os

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'



codon_to_aa = {'TTT':'F','TTC':'F','TTA':'L','TTG':'L',
              'CTT':'L','CTC':'L','CTA':'L','CTG':'L',
              'ATT':'I','ATC':'I','ATA':'I','ATG':'M',
              'GTT':'V','GTC':'V','GTA':'V','GTG':'V',

               'TCT':'S','TCC':'S','TCA':'S','TCG':'S',
              'CCT':'P','CCC':'P','CCA':'P','CCG':'P',
              'ACT':'T','ACC':'T','ACA':'T','ACG':'T',
              'GCT':'A','GCC':'A','GCA':'A','GCG':'A',

               'TAT':'Y','TAC':'Y','TAA':'\n STOP \n','TAG':'\n STOP \n',
              'CAT':'H','CAC':'H','CAA':'Q','CAG':'Q',
              'AAT':'N','AAC':'N','AAA':'K','AAG':'K',
              'GAT':'D','GAC':'D','GAA':'E','GAG':'E',

               'TGT':'C','TGC':'C','TGA':'\n STOP \n','TGG':'W',
               'CGT':'R','CGC':'R','CGA':'R','CGG':'R',
               'AGT':'S','AGC':'S','AGA':'R','AGG':'R',
               'GGT':'G','GGC':'G','GGA':'G','GGG':'G',

              }

invert_base_seq = {'A':'T','T':'A','G':'C','C':'G'}

base_mapping = {'R': ['G','A'],
                'Y': ['C','T'],
                'S': ['G','C'],
                'W': ['A','T'],
                'K': ['G','T'],
                'M': ['C','A'],
                'A':['A'],
                'G':['G'],
                'C':['C'],
                'T':['T'],
                'U':['T'],
                'N':['A','T','C','G']}

ecoli_pref = {
            "A": 'GCG',
            "R": 'CGT',
            "N": 'AAC',
            "D": 'GAT',
            "C": 'TGC',
            "Q": 'CAG',
            "E": 'GAA',
            "G": 'GGC',
            "H": 'CAT',
            "I": 'ATT',
            "L": "CTG",
            "K": 'AAA',
            "M": 'ATG',
            "F": "TTT",
            "P": 'CCG',
            "S": 'AGC',
            "T": 'ACC',
            "W": 'TGG',
            "Y": "TAT",
            "V": 'GTG',
}
ecoli_pref_hydrophilic = {
#             "A": 'GCG',
            "R": 'CGT',
            "N": 'AAC',
            "D": 'GAT',
            "C": 'TGC',
            "Q": 'CAG',
            "E": 'GAA',
            "G": 'GGC',
            "H": 'CAT',
#             "I": 'ATT',
#             "L": "CTG",
            "K": 'AAA',
#             "M": 'ATG',
#             "F": "TTT",
            "P": 'CCG',
            "S": 'AGC',
            "T": 'ACC',
#             "W": 'TGG',
#             "Y": "TAT",
#             "V": 'GTG',
}
ecoli_pref_hydrophobic = {
            "A": 'GCG',
#             "R": 'CGT',
#             "N": 'AAC',
#             "D": 'GAT',
#             "C": 'TGC',
#             "Q": 'CAG',
#             "E": 'GAA',
#             "G": 'GGC',
#             "H": 'CAT',
            "I": 'ATT',
            "L": "CTG",
#             "K": 'AAA',
            "M": 'ATG',
            "F": "TTT",
#             "P": 'CCG',
#             "S": 'AGC',
#             "T": 'ACC',
            "W": 'TGG',
            "Y": "TAT",
            "V": 'GTG',
}
conv_type_dict = {'ecoli':ecoli_pref,
                 'ecoli_hydrophilic':ecoli_pref_hydrophilic,
                 'ecoli_hydrophobic':ecoli_pref_hydrophobic}

special_token_mapping = {'ecoli': list(ecoli_pref.values()),
                        'ecoli_hydrophilic': list(ecoli_pref_hydrophilic.values()),
                        'ecoli_hydrophobic': list(ecoli_pref_hydrophobic.values())}
aa_to_codon = {}

for key, value in codon_to_aa.items():
    if aa_to_codon.get(value) is None:
        aa_to_codon[value] = [key]
    else:
        aa_to_codon[value].append(key)


def aa_to_codon_convert(sequence,conv_type):

    codon_list = []
    for letter in sequence:
        if conv_type_dict[conv_type].get(letter.upper()) is None:
            return None
        else:
            codon_list.append(conv_type_dict[conv_type][letter.upper()])
    codons = ''.join(codon_list)

    return codons



def convert_to_aa(codon_list:list):

    aa_list =[codon_to_aa[codon] for codon in codon_list]
    return aa_list
def gen_fasta(overall_pool):
    for names, sequences in overall_pool.items():
        yield (f'>{names}\n{sequences}\n')

def generate_X_variants(sequence,species):
    aa_buffer = [[]]
    for aa in sequence:

        if aa == 'X':
            new_buffer = []
            for cur_seq in aa_buffer:
                # start_seq = ''.join(cur_seq)
                aa_add = []
                for aa2 in 'ARNDCQEGHILKMFPSTWYV':
                    if aa_to_codon_convert(aa2,species) is not None:
                        aa_add.append([*cur_seq,aa_to_codon_convert(aa2,species)] )

                new_buffer.extend(aa_add)
            aa_buffer = new_buffer
        else:
            for cur_seq in (aa_buffer):
                cur_seq.append(aa_to_codon_convert(aa,conv_type='ecoli'))
    aa_buffer = [''.join(seq) for seq in aa_buffer]
    return aa_buffer

def generate_retrons(sequence: str,target_position: list,extension: str = None, predefined_pool: list = None,retron_size = 90, #bp
                     barcodes = ('',''),overhangs=('ATTC','GTAC'), goldengate_sites = ('CGTCTCA','TGAGACG'),):
    '''

    :param sequence:
    :param extension:
    :param target_position:
    :param retron_size:
    :param barcodes:
    :param goldengate_sites:
    :param overhangs:
    :return:
    '''



    retron_pool = []

    bl,br = barcodes
    ovl,ovr = overhangs
    goldl,goldr = goldengate_sites



    if extension is not None:
        seqs = []
        for part in extension.split(':'):
            if special_token_mapping.get(part,None) is None:
                seqs.extend([base_mapping[base] for base in part])
            else:
                seqs.append(special_token_mapping[part])
        combinations = list(product(*seqs))
    elif combinations is not None:
        combinations = predefined_pool

    else:
        raise ValueError('Need a extension or target pool')

    insert_pool = [''.join(combo) for combo in combinations]


    size = retron_size-len(insert_pool[0])
    retron_size_left,retron_size_right = (size//2,size//2) if size%2 == 0 else (size//2,(size//2)+1)


    if isinstance(target_position,int):
        print(f"\n{sequence[target_position-retron_size_left:target_position]}{bcolors.FAIL}{bcolors.BOLD}!INSERT!{bcolors.ENDC}{sequence[target_position:target_position+retron_size_right]}")
        for insert in insert_pool:
            retron = f"{sequence[target_position-retron_size_left:target_position]}|{insert}|{sequence[target_position:target_position+retron_size_right]}"
            assert len(retron)-2 == retron_size
            # print(f"{sequence[target_position-retron_size_left:target_position]}{bcolors.OKGREEN}{insert}{bcolors.ENDC}{sequence[target_position:target_position+retron_size_right]}")



            fwd_primer = f'{bl}{goldl}{ovl}'
            rev_primer = ''.join([invert_base_seq[nt] for nt in f'{ovr}{goldr}{br}'[::-1]])


            retron_pool.append((insert,
                                fwd_primer,
                               rev_primer,
                               f'{bl}|{goldl}|{ovl}|{retron}|{ovr}|{goldr}|{br}'))
    else:
        start,end = target_position
        if (end-start)%3 != 0:
            print(f'{bcolors.WARNING}[WARNING] - The current selection might lead to a frameshift!{bcolors.ENDC}')


        print(f"{sequence[start-retron_size_left:start]}{bcolors.FAIL}{bcolors.BOLD}{sequence[start:end]}{bcolors.ENDC}{sequence[end:end+retron_size_right]}")
        for insert in insert_pool:

            retron = f"{sequence[start-retron_size_left:start]}|{insert}|{sequence[end:end+retron_size_right]}"
            assert len(retron)-2 == retron_size
            # print(f"{sequence[start-retron_size_left:start]}{bcolors.OKGREEN}{insert}{bcolors.ENDC}{sequence[end:end+retron_size_right]}")



            fwd_primer = f'{bl}{goldl}{ovl}'
            rev_primer = ''.join([invert_base_seq[nt] for nt in f'{ovr}{goldr}{br}'[::-1]])

            retron_pool.append((insert,
                                fwd_primer,
                               rev_primer,
                               f'{bl}|{goldl}|{ovl}|{retron}|{ovr}|{goldr}|{br}'))

    return retron_pool

# WIP
# updated_pool = []
# for insert in insert_pool:
#     opti = ecoli_pref.get(codon_to_aa[insert],None)
#     if opti is not None:
#         updated_pool.append(opti)
# list(set(updated_pool))

from urllib.request import urlretrieve
url_fwd = 'https://raw.githubusercontent.com/schmidt-lab/SPINE/master/SPINE/data/forward_finalprimers.fasta'
url_rev = 'https://raw.githubusercontent.com/schmidt-lab/SPINE/master/SPINE/data/reverse_finalprimers.fasta'

if not os.path.isfile('forward_finalprimers.fasta') or not os.path.isfile('reverse_finalprimers.fasta'):
    print('Downloading barcoding lists')
    urlretrieve(url_fwd,'forward_finalprimers.fasta')
    urlretrieve(url_rev,'reverse_finalprimers.fasta')
barcode_dict = {}
with open('forward_finalprimers.fasta','r') as f:
    for line in f:
        if line.startswith('>'):
            key = '-'.join(line.strip().split('-')[:2])
        else:
            barcode_dict[key] = line.strip()
with open('reverse_finalprimers.fasta','r') as f:
    for line in f:
        if line.startswith('>'):
            key = '-'.join(line.strip().split('-')[:2])
        else:
            barcode_dict[key] = [barcode_dict[key.replace('F','R')],line.strip()]



## Barcode Generator

maximum number is 3000

In [None]:
barcode_number = 11 # @param {type:"integer"}
print(':'.join(barcode_dict[f'>skpp-{barcode_number}']))

## Systematic Screen

In [None]:
# @title Input parameters here:

experiment_name = '' #@param {type:"string"}

sequence = '' #@param {type:"string"}

barcode = 'TTATAATCATCCTCCCCGGC:CCAAATAGGATGTGTGCTCG' #@param {type:"string"}
goldengate_site = 'CGTCTCA:TGAGACG'#@param {type:"string"}
overhang = 'ATTC:GTAC'#@param {type:"string"}
start_position = 0#@param {type:"integer"}
window_size = 3 #@param {type:"integer"}
step_size = 3 #@param {type:"integer"}
end_position = 10 #@param {type:"raw"}
retron_size = 90 #@param {type:"integer"}
extension = 'ATG'#@param {type:"string"}
#species = 'all' #@param ["all","ecoli"]

goldengate_site = goldengate_site.split(':')
overhang = overhang.split(':')
barcode = barcode.split(':')
seqs = []
for part in extension.split(':'):
    if special_token_mapping.get(part,None) is None:
        seqs.extend([base_mapping[base] for base in part])
    else:
        seqs.append(special_token_mapping[part])
combinations = list(product(*seqs))

insert_pool = [''.join(combo) for combo in combinations]
print(f'[INFO] - Length provided sequence: {len(sequence)} Length insert: {len(extension)}\
 Total: {len(sequence) + len(extension)}')

print('[INFO] - Inserts:',insert_pool)

print(f'Barcodes: {barcode}')

if end_position is None:
  end_position = start_position

overall_pool = {}

for pos in range(start_position,end_position+step_size,step_size):
  if window_size != 0:
    target_position = [pos,pos+window_size]
  else:
    target_position = pos

  print(f'\nPosition - [{pos}] - POOL:\n')
  retronpool = generate_retrons(sequence=sequence.upper(),
                          extension=extension,
                          target_position=target_position,
                          retron_size=retron_size,
                          barcodes=barcode,
                          goldengate_sites=goldengate_site,
                          overhangs=overhang)


  for insert, fwd,rev,seq in retronpool:
    b1,gg1,ov1,left,var,right,ov2,gg2,b2 = split_seq = seq.split('|')
    if len(insert) == 0:
        kind = 'Del'
    elif window_size-len(insert) == 0:
        kind = 'Mut_'
    else:
        kind = 'Ins_'
    overall_pool[f'{experiment_name}_{pos}_{kind}{insert}'] = seq.replace('|','')

#     print(f"{bcolors.OKCYAN}{b1}{bcolors.ENDC}{bcolors.OKBLUE}{gg1}{bcolors.ENDC}\
# {bcolors.OKGREEN}{ov1}{bcolors.ENDC}{left}{bcolors.FAIL}{var}{bcolors.ENDC}\
# {right}{bcolors.OKGREEN}{ov2}{bcolors.ENDC}{bcolors.OKBLUE}{gg2}{bcolors.ENDC}\
# {bcolors.OKCYAN}{b2}{bcolors.ENDC}")
#   print('\n')
#   print(f'FWD: {fwd}\nREV: {rev}')
#   print('\n--------------------------\n')


## Saving

In [None]:
with open(f'{experiment_name}_Pool.fasta','w') as f:
    for line in gen_fasta(overall_pool):
        f.write(line)

In [None]:
raise ValueError('Stop')

## Linkers at static site

In [None]:
# @title Input parameters here:

experiment_name = '' #@param {type:"string"}

sequence = '' #@param {type:"string"}

barcode = 'ATATAGATGCCGTCCTAGCG:TGGGCACAGGAAAGATACTT' #@param {type:"string"}
goldengate_site = 'CGTCTCA:TGAGACG'#@param {type:"string"}
overhang = 'ATTC:GTAC'#@param {type:"string"}
target_position = 0#@param {type:"integer"}

retron_size = 0 #@param {type:"integer"}
linker_pool = 'GG,PP,GXG,GGSG,GSGG,GPPG,GSGSG,GPPPG,GSGGSG,GPPPPG,-,--,---,+,++,+++,X'#@param {type:"string"}
species = 'ecoli' #@param ["ecoli","ecoli_hydrophilic","ecoli_hydrophobic"]


linker_pool = linker_pool.split(',')

combinations = []
for linker in linker_pool:
    if 'X' in linker.upper():
        combinations.extend(generate_X_variants(linker,species))

    elif '-' in linker.upper() or '+' in linker.upper():
        combinations.append(3*(linker))
    else:
        combinations.append(aa_to_codon_convert(linker,species))


start_position = end_position = target_position
window_size = 0

goldengate_site = goldengate_site.split(':')
overhang = overhang.split(':')
barcode = barcode.split(':')
# seqs = [base_mapping[base] for base in extension]
# combinations = list(product(*seqs))

insert_pool = [''.join(combo) for combo in combinations]
print(f'[INFO] - Length provided sequence: {len(sequence)}\
 Total: {len(sequence) + 3}')
print('[INFO] - Inserts:',insert_pool)
print(f'Barcodes: {barcode}')
if end_position is None:
  end_position = start_position

overall_pool = {}
pos = target_position
if window_size != 0:
    target_position = [pos,pos+window_size]
else:
    target_position = pos

print(f'\nPosition - [{pos}] - POOL:\n')
retronpool = []
for extension in combinations:

    if '-' in extension:
       w = (len(extension))
       target_position = [pos-w-window_size,pos-window_size]
       extension = ''
       print(target_position,window_size)
    elif '+' in extension:
       w = (len(extension))
       target_position = [pos,pos+w]
       extension = ''
       print(target_position,window_size)
    else:
        target_position = pos#[pos,pos+window_size]
        w = window_size
    retronpool = generate_retrons(sequence=sequence.upper(),
                          extension=extension,
                          target_position=target_position,
                          retron_size=retron_size,
                          barcodes=barcode,
                          goldengate_sites=goldengate_site,
                          overhangs=overhang)


    for insert, fwd,rev,seq in retronpool:
        b1,gg1,ov1,left,var,right,ov2,gg2,b2 = split_seq = seq.split('|')
        if len(insert) == 0:
            kind = f'Del_{w}'
        elif window_size-len(insert) == 0:
            kind = 'Mut_'
        else:
            kind = 'Ins_'
        overall_pool[f'{experiment_name}_{pos}_{kind}{insert}'] = seq.replace('|','')



## Saving

In [None]:
with open(f'{experiment_name}_Pool.fasta','w') as f:
    for line in gen_fasta(overall_pool):
        f.write(line)

----