# Design of a BASIC linker for the backbone collection

This notebook contains code relating to the design of a linker to join parts needed to generate BASIC DNA assembly backbones.

## Aims and objectives for cells below

- Generate a unique job ID for the r2o job specification.

In [36]:
import hashlib
import basicsynbio as bsb
from basicsynbio.utils import all_feature_values, feature_from_qualifier
from basicsynbio.cam import seqrecord_hexdigest
import re
from pathlib import Path

In [10]:
print(hashlib.md5("basic_bb_linker_001".encode("UTF-8")).hexdigest())


92f0817faf716045872b354b5b2482a7


## Aims and objectives for cells below

- Generate sequence for specification file.

## Notes of sequence generation

- The original neutral linker designs were 45 bp in length. To this the necessary 8 bases required for ligation to BsaI digested parts were added, yielding 53 bp sequences.



In [12]:
format = "GGCTCG" + 45*"N" + "GTCC"
print("Format " + format)
assert len(format) == 53 + 2

Format GGCTCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTCC


## Notes on parameter generation

- Each prefix and suffix half contained in addition to required homology to BsaI-digested parts a 12 bp double stranded region and a 21 base ss overlap region driving annealing between linker-ligated-parts. The target Tm for the ss region was set to 60 °C.
- The overall target GC content of the linker was set to 40 %, similar to the original linker designs.
- Some forbidden sequences:
    - BsaI restriction site: GGTCTC
    - Prefix 4 bp overhang and rc: GTCC & rc
    - Suffix 4 bp overhang and rc: CTCG & rc
    - Various E. coli promoter sites: (refer to r2o job spec file)
    - G-quadruplex: GGGG
    - Translation initiation site: ATG
    - RBS core: AAAGA
    - SD sequence: AGGAGG
    - IS10 site: NGCTNAGCN
    - IS231 ste: GGGNNNNNCCC
    - More detailed in the specification job file.

## Aims and objectives for cell/s below

- Export key sequences from basicsynbio as a FASTA file. The backbone linker should have low homology to these sequences.

In [42]:
# Avoid homology between existing Biolegio linkers and promoters
collections = (bsb.BASIC_PROMOTER_PARTS["v0.2"], bsb.BASIC_BIOLEGIO_LINKERS["v0.1"])
sequences = [value.basic_slice() for collection in collections for value in collection.values()]


# Avoid homology between SEVA terminators "SEVA_TO", "SEVA_T1".

# Avoid homology against the following resistance cassettes: Amp, Kan, Cam, Spec, Tet, Gen (SEVA 1 - 6 inc.)

# Avoid homology against the following oris: p15A, pSC101, pUC and pBR322-ROP (_6 - _9). NB pSC101 has high homology with it's temperature sensitive variant

seva_abr_nums = [num for num in range(18, 78, 10)]
seva_ori_nums = [num for num in range(16, 20)]
seva_nums = set(seva_abr_nums + seva_ori_nums)
seva_seqrecs = [bsb.BASIC_SEVA_PARTS["v0.1"][str(num)].to_seqrec() for num in seva_nums]
for seva_seqrec in seva_seqrecs:
    for feature in seva_seqrec.features:
        if re.match("SEVA_[a-z]*", feature.qualifiers["label"][0]):
            sequences.append(feature.extract(seva_seqrec))
# Get unique SEVA features and assign attributes
seen = set()
sequences = [seen.add(obj.features[0].qualifiers["label"][0]) or obj for obj in sequences if obj.features[0].qualifiers["label"][0] not in seen]
for sequence in sequences:
    sequence.name = sequence.features[0].qualifiers["label"][0]
    sequence.description = sequence.name
    sequence.id = seqrecord_hexdigest(sequence)
path_to_seqs = Path.cwd().parents[0] / "sequences"
bsb.export_sequences_to_file(sequences, path_to_seqs / "fasta_files" / "seqs_to_avoid_for_bb_linker.fasta", "fasta")



