# Integration of basicsynbio and DNA chisel

This notebook explores the integration of DNA chisel into basicsynbio for linker design purposes

## Aims and objectives for cell/s below

- [x] Try out DNA Chisel with easy to implement constraints.
- [x] Make a Bowtie 2 file for sequences that will be present in the basicsynbio PartLinkerCollections.
- [x] Run DNA Chisel to generate the backbone linker for the addgene collection.

In [39]:
from dnachisel import (
    AvoidChanges,
    AvoidMatches,
    AvoidPattern,
    DnaOptimizationProblem,
    EnforceGCContent,
    EnforceMeltingTemperature,
    EnzymeSitePattern,
    random_dna_sequence,
)
import basicsynbio as bsb
from Bio import (
    Entrez,
    SeqIO
)
from pathlib import Path

In [40]:
linker_base_sequence = "GGCTCG" + random_dna_sequence(45, seed=123) + "GTCC"
constraints = [
    EnforceMeltingTemperature(mini=50, maxi=65, location=(13, 34)),
    AvoidPattern(EnzymeSitePattern("EcoRI")),
    AvoidPattern(EnzymeSitePattern("SpeI")),
    AvoidPattern(EnzymeSitePattern("XbaI")),
    AvoidPattern(EnzymeSitePattern("PstI")),
    AvoidPattern(EnzymeSitePattern("BsaI")),
    AvoidPattern(EnzymeSitePattern("BsmBI")),
    AvoidPattern("TTGACA"), # E.coli sig70 -35 site
    AvoidPattern("TATAAT"), # E.coli sig70 -10 site
    AvoidPattern("TTGNNNNNNNNNNNNNNNNNNNNTATNNT"), # E.coli sig70 promoter weak consensus,
    AvoidPattern("TGGCACGNNNNTTGC"), # E.coli sig54 promoter consensus
    AvoidPattern("GAACTNNNNNNNNNNNNNNNNGTCNNA"), # E.coli sig24 promoter consensus
    AvoidPattern("AAAGA"), # RBS
    AvoidPattern("AGGAGG"), # Shine-Dalgarno sequence or 2xArg bad codon
    AvoidPattern("ATG"), # Start codon
    AvoidPattern("TTATNCACA"), # DnaA binding sites
    AvoidPattern("TGTGANNNNNNTCACANT"), # CAP binding sites
    AvoidPattern("NGCTNAGCN"), # IS10 insertion site
    AvoidPattern("GGGNNNNNCCC"), # IS231 insertion site
    AvoidPattern("(G{3,}[ATGC]{1,7}){3,}G{3,}"), # G-quadruplex
    AvoidPattern("GGGG"), # G-quadruplex
    AvoidChanges(location=(0, 6, 1)),
    AvoidChanges(location=(len(linker_base_sequence) - 4, len(linker_base_sequence), 1))
]
problem = DnaOptimizationProblem(
    sequence=linker_base_sequence,
    constraints=constraints
)
problem.resolve_constraints()
print(problem.constraints_text_summary())
print(linker_base_sequence)
print(problem.sequence)

constraint:   0%|          | 0/21 [00:00<?, ?it/s, now=AvoidPattern[0-55](patter...]

location:   0%|          | 0/1 [00:00<?, ?it/s, now=None][A[A

location:   0%|          | 0/1 [00:00<?, ?it/s, now=36-39(+)][A[A

                                                                                    ===> SUCCESS - all constraints evaluations pass
✔PASS ┍ EnforceMeltingTemperature[13-34]
      │ Tm = 52.9
✔PASS ┍ AvoidPattern[0-55](pattern:EcoRI(GAATTC))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-55](pattern:SpeI(ACTAGT))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-55](pattern:XbaI(TCTAGA))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-55](pattern:PstI(CTGCAG))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-55](pattern:BsaI(GGTCTC))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-55](pattern:BsmBI(CGTCTC))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-55](pattern:TTGACA)
      │ Passed. Pattern not found 

In [41]:
# Group together parts and linkers from collections
parts_linkers_collections = (
    bsb.BASIC_BIOLEGIO_LINKERS["v0.1"].values(),
    bsb.BASIC_PROMOTER_PARTS["v0.2"].values(),
    bsb.BASIC_SEVA_PARTS["v0.1"].values()
)
core_parts_linkers = []
for part_linker_collection in parts_linkers_collections:
    core_parts_linkers += list(part_linker_collection)
# Add E.coli MG1655 genome sequence
Entrez.email = "hainesm6@gmail.com"
import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
getattr(ssl, '_create_unverified_context', None)):
    ssl._create_default_https_context = ssl._create_unverified_context
with Entrez.efetch(db="Nucleotide", id="NZ_LR881938.1", rettype="fasta", retmode="text") as handle:
    mg1655 = SeqIO.read(handle, "fasta")
    seqs = core_parts_linkers + [mg1655]
path_to_seqs = Path.cwd().parents[0] / "sequences"
bsb.export_sequences_to_file(
    seqs,
    Path.cwd().parents[0] / "sequences" / "alternative_formats" / "fasta" / "basic_homology_sequences.fa",
    "fasta"
)


In [42]:
constraints += [AvoidMatches(15, bowtie_index=path_to_seqs / "alternative_formats" / "bowtie_indexes" / "2021-07-22_basic_homology" / "basic_homology", mismatches=1)]
problem = DnaOptimizationProblem(
    sequence=linker_base_sequence,
    constraints=constraints
)
problem.resolve_constraints()
assert problem.sequence[:len("GGCTCG")] == "GGCTCG"
assert problem.sequence[-1*len("GTCC"):] == "GTCC"
print(problem.constraints_text_summary())
print(linker_base_sequence)
print(problem.sequence)

constraint:   0%|          | 0/22 [00:00<?, ?it/s, now=AvoidPattern[0-55](patter...]

location:   0%|          | 0/1 [00:00<?, ?it/s, now=None][A[A

location:   0%|          | 0/1 [00:00<?, ?it/s, now=36-39(+)][A[A

                                                             [A[A

location:   0%|          | 0/10 [00:00<?, ?it/s, now=36-39(+)][A[A

location:   0%|          | 0/10 [00:00<?, ?it/s, now=8-23]    [A[A

                                                          [A[A

location:   0%|          | 0/10 [00:00<?, ?it/s, now=8-23][A[A

location:  10%|█         | 1/10 [00:00<00:02,  3.91it/s, now=9-24][A[A

location:  10%|█         | 1/10 [00:00<00:02,  3.89it/s, now=9-24][A[A

location:  20%|██        | 2/10 [00:00<00:02,  3.89it/s, now=29-44][A[A

location:  20%|██        | 2/10 [00:00<00:02,  3.78it/s, now=29-44][A[A

location:  50%|█████     | 5/10 [00:00<00:01,  3.78it/s, now=38-53][A[A

location:  50%|█████     | 5/10 [00:00<00:01,  4.74it/s, now=38-53

## Aims and objectives for the cell/s below

- Evaluate the generated linker sequence:
     - [ ] What is the sequence of the suffix and prefix long linker oligonucleotides?
     - [ ] What is the sequence of the suffix and prefix adapter linker oligonucleotides?
     - [ ] What is the melting temperature of the overlap region?