# [![logo](logo.png)](https://github.com/BjornFJohansson/ypkpathway#-ypkpathway) pTA1_PDC1_EcfabH_TEF1_EcfabD_FBA1_EcfabG_RPL22A_EcacpP_TDH3_EcfabF_UTR2_EcfabB_TPI1_EcfabA_PMP3_EcfabZ_ENO2_EcfabI_RPL5_EctesA_RPL16A_EctesB_RPL17A_EcacpS_TMA19

Assembly of 12 transcriptional units
(single gene expression) vectors into a pathway.

Jupyter notebooks describing the single gene expression vectors are linked
at the end of this document.
Specific primers needed are also listed.

![pathway with N genes](pw.png "pathway with N genes")

In [1]:
from pydna.parsers import parse_primers
from pydna.readers import read
from pydna.amplify import pcr
from pydna.assembly import Assembly
from IPython.display import display
from IPython.display import Markdown
from pathlib import Path

The first cassette in the pathway is amplified with standard
primers 577 and 778, the last with
1123 and 578 and all others with 1123 and 778.
Standard primers are listed [here](standard_primers.fasta).

In [2]:
p = {x.name: x for x in parse_primers("standard_primers.fasta")}

Restriction enzymes are imported from the Biopython package.

In [3]:
from Bio.Restriction import FspAI, NotI, PacI

The backbone vector is linearized by digestion
with [FspAI](https://www.google.com/search?q=FspAI).

In [4]:
backbone = read("pTA1.gb")

The cassette__pcr_products variable holds the list of expression
cassette PCR products fragments to be assembled.

In [5]:
cassette_pcr_products = []

The expression cassettes comes from a series of single gene expression
vectors held in the template_vectors list.

In [6]:
cassette_vectors = ("""
pYPK0_PDC1_EcfabH_TEF1.gb
pYPK0_TEF1_EcfabD_FBA1.gb
pYPK0_FBA1_EcfabG_RPL22A.gb
pYPK0_RPL22A_EcacpP_TDH3.gb
pYPK0_TDH3_EcfabF_UTR2.gb
pYPK0_UTR2_EcfabB_TPI1.gb
pYPK0_TPI1_EcfabA_PMP3.gb
pYPK0_PMP3_EcfabZ_ENO2.gb
pYPK0_ENO2_EcfabI_RPL5.gb
pYPK0_RPL5_EctesA_RPL16A.gb
pYPK0_RPL16A_EctesB_RPL17A.gb
pYPK0_RPL17A_EcacpS_TMA19.gb
""").split()

In [7]:
cassette_vectors

['pYPK0_PDC1_EcfabH_TEF1.gb',
 'pYPK0_TEF1_EcfabD_FBA1.gb',
 'pYPK0_FBA1_EcfabG_RPL22A.gb',
 'pYPK0_RPL22A_EcacpP_TDH3.gb',
 'pYPK0_TDH3_EcfabF_UTR2.gb',
 'pYPK0_UTR2_EcfabB_TPI1.gb',
 'pYPK0_TPI1_EcfabA_PMP3.gb',
 'pYPK0_PMP3_EcfabZ_ENO2.gb',
 'pYPK0_ENO2_EcfabI_RPL5.gb',
 'pYPK0_RPL5_EctesA_RPL16A.gb',
 'pYPK0_RPL16A_EctesB_RPL17A.gb',
 'pYPK0_RPL17A_EcacpS_TMA19.gb']

In [8]:
template_vectors = [read(v) for v in cassette_vectors]

In [9]:
for tv in template_vectors:
    display(tv)

The first cassette in the pathway.
Suggested PCR conditions can be found at the end of this document.

In [10]:

fp_first = p['577_crp585-557']
fp = p['1123_New775']
rp = p['778_tp_Eco32I_rev']
rp_last = p['578_crp42-70']

In [11]:
cassette_pcr_products.append(pcr(fp_first, rp, template_vectors[0]))

Intermediary cassettes

In [12]:
cassette_pcr_products.extend(pcr(fp, rp, v)
                             for v in template_vectors[1:-1])

The last cassette in the pathway.

In [13]:
cassette_pcr_products.append(pcr(fp, rp_last, template_vectors[-1]))

The cassettes are given names based on the tu cassette

In [14]:
for cp, ve in zip(cassette_pcr_products, cassette_vectors):
    cp.name = ve[:-3].split("_", maxsplit=1)[1]
    print(cp.name)

PDC1_EcfabH_TEF1
TEF1_EcfabD_FBA1
FBA1_EcfabG_RPL22A
RPL22A_EcacpP_TDH3
TDH3_EcfabF_UTR2
UTR2_EcfabB_TPI1
TPI1_EcfabA_PMP3
PMP3_EcfabZ_ENO2
ENO2_EcfabI_RPL5
RPL5_EctesA_RPL16A
RPL16A_EctesB_RPL17A
RPL17A_EcacpS_TMA19


Cassettes and linear plasmid backbone are joined by homologous recombination

In [15]:
asm = Assembly([backbone.linearize(FspAI)] + cassette_pcr_products,
               limit=167-47-10)
asm

Assembly
fragments..: 6175bp 2780bp 2282bp 1934bp 1504bp 2703bp 2566bp 2000bp 1875bp 1953bp 1713bp 2114bp 1878bp
limit(bp)..: 110
G.nodes....: 26
algorithm..: common_sub_strings

There should normally be two candidates of equal size.
These sequences should be identical.

In [16]:
candidates = asm.assemble_circular()
candidates

[Contig(o24164), Contig(o24164)]

In [17]:
candidate, *rest = candidates

In [18]:
candidate.cseguid() == rest[0].cseguid()

True

This assembly figure below shows how the fragments came together.

In [19]:
candidate.figure()

 -|pTA1_lin|124
|           \/
|           /\
|           124|PDC1_EcfabH_TEF1|593
|                                \/
|                                /\
|                                593|TEF1_EcfabD_FBA1|644
|                                                     \/
|                                                     /\
|                                                     644|FBA1_EcfabG_RPL22A|440
|                                                                            \/
|                                                                            /\
|                                                                            440|RPL22A_EcacpP_TDH3|712
|                                                                                                   \/
|                                                                                                   /\
|                                                                                                   712|TDH3_EcfabF_UTR2|6

The candidate vector is synchronized to the 577 primer. This means that
the plasmid origin is shifted so that it matches the backbone vector.

In [20]:
pw = candidate.synced(fp_first)

The cseguid checksum for the resulting plasmid is calculated for future
reference.
The [cseguid checksum](
http://pydna.readthedocs.org/en/latest/pydna.html#pydna.utils.cseguid)
uniquely identifies a circular double stranded sequence.

In [21]:
pw.cseguid()

'IEz8MnXcVYI9rdrAwzDl5bE-gig'

The file is given a name based on the sequence of expressed genes.

In [22]:
pw.locus = "pw"
pw.definition = "pTA1_PDC1_EcfabH_TEF1_EcfabD_FBA1_EcfabG_RPL22A_EcacpP_TDH3_EcfabF_UTR2_EcfabB_TPI1_EcfabA_PMP3_EcfabZ_ENO2_EcfabI_RPL5_EctesA_RPL16A_EctesB_RPL17A_EcacpS_TMA19"

Sequence stamped with cseguid checksum.
This can be used to verify the integrity of the sequence file.

In [23]:
pw.stamp("cSEGUID")

IEz8MnXcVYI9rdrAwzDl5bE-gig

Write sequence to a local file.

In [24]:
pw.write("pTA1_PDC1_EcfabH_TEF1_EcfabD_FBA1_EcfabG_RPL22A_EcacpP_TDH3_EcfabF_UTR2_EcfabB_TPI1_EcfabA_PMP3_EcfabZ_ENO2_EcfabI_RPL5_EctesA_RPL16A_EctesB_RPL17A_EcacpS_TMA19.gb")



The pathway can be extended by digestion with either NotI or PacI or both
provided that the enzymes cut once in the final pathway sequence.

In [25]:
print(f"NotI cuts {len(pw.cut(NotI))} time(s) and PacI cuts "
      f"{len(pw.cut(PacI))} time(s) in the final pathway.")

NotI cuts 1 time(s) and PacI cuts 2 time(s) in the final pathway.


### Transcriptional unit (single gene expression) vectors needed.

In [26]:
for cv in cassette_vectors:
    cassette_vector = Path(cv).with_suffix('.ipynb')
    display(Markdown(f"[{cassette_vector}]({cassette_vector})"))

[pYPK0_PDC1_EcfabH_TEF1.ipynb](pYPK0_PDC1_EcfabH_TEF1.ipynb)

[pYPK0_TEF1_EcfabD_FBA1.ipynb](pYPK0_TEF1_EcfabD_FBA1.ipynb)

[pYPK0_FBA1_EcfabG_RPL22A.ipynb](pYPK0_FBA1_EcfabG_RPL22A.ipynb)

[pYPK0_RPL22A_EcacpP_TDH3.ipynb](pYPK0_RPL22A_EcacpP_TDH3.ipynb)

[pYPK0_TDH3_EcfabF_UTR2.ipynb](pYPK0_TDH3_EcfabF_UTR2.ipynb)

[pYPK0_UTR2_EcfabB_TPI1.ipynb](pYPK0_UTR2_EcfabB_TPI1.ipynb)

[pYPK0_TPI1_EcfabA_PMP3.ipynb](pYPK0_TPI1_EcfabA_PMP3.ipynb)

[pYPK0_PMP3_EcfabZ_ENO2.ipynb](pYPK0_PMP3_EcfabZ_ENO2.ipynb)

[pYPK0_ENO2_EcfabI_RPL5.ipynb](pYPK0_ENO2_EcfabI_RPL5.ipynb)

[pYPK0_RPL5_EctesA_RPL16A.ipynb](pYPK0_RPL5_EctesA_RPL16A.ipynb)

[pYPK0_RPL16A_EctesB_RPL17A.ipynb](pYPK0_RPL16A_EctesB_RPL17A.ipynb)

[pYPK0_RPL17A_EcacpS_TMA19.ipynb](pYPK0_RPL17A_EcacpS_TMA19.ipynb)

### Suggested PCR conditions

In [27]:
for prd in cassette_pcr_products:
    print("\n\n\n\n")
    print("product name:", prd.name)
    print("forward primer", prd.forward_primer.name)
    print("reverse primer", prd.reverse_primer.name)
    print(prd.program())






product name: PDC1_EcfabH_TEF1
forward primer 577_crp585-557
reverse primer 778_tp_Eco32I_rev
|95°C|95°C               |    |tmf:64.6
|____|_____          72°C|72°C|tmr:53.9
|3min|30s  \ 55.5°C _____|____|45s/kb
|    |      \______/ 2:05|5min|GC 43%
|    |       30s         |    |2780bp





product name: TEF1_EcfabD_FBA1
forward primer 1123_New775
reverse primer 778_tp_Eco32I_rev
|95°C|95°C               |    |tmf:70.4
|____|_____          72°C|72°C|tmr:53.9
|3min|30s  \ 55.6°C _____|____|45s/kb
|    |      \______/ 1:42|5min|GC 44%
|    |       30s         |    |2282bp





product name: FBA1_EcfabG_RPL22A
forward primer 1123_New775
reverse primer 778_tp_Eco32I_rev
|95°C|95°C               |    |tmf:70.4
|____|_____          72°C|72°C|tmr:53.9
|3min|30s  \ 54.6°C _____|____|45s/kb
|    |      \______/ 1:27|5min|GC 40%
|    |       30s         |    |1934bp





product name: RPL22A_EcacpP_TDH3
forward primer 1123_New775
reverse primer 778_tp_Eco32I_rev
|95°C|95°C               | 