# Creation of in vitro insertion parts for pfa chain in C.Cinerea

Now that promoters have been collected and generated and ran through a quantative flouresence analysis we can create the full list of inserts and necesarry primers for the top 5 primers and terminators. 

This insert will contain: <br>
- PABA marker gene <br>
- PUFA synthesis pathway <br>
    - pfa1,2,3 and a pptase <br>
- Hygromicin marker gene. 

*this is where we insert a cool image of the creation*

## Loading Libraries

## Genetic elements

### HR Domain

The insertion site into C.Cinerea will be the spoII site based on the report: <br> *Non-conventional fungi are efficient heterologous hosts for natural product production.*

### PABA marker

### PUFA Synthesis pathway

### Hygromicin marker

## Primer creation using TEEMI

### Imports

In [9]:
from teemi.design.combinatorial_design import DesignAssembly

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import gridspec
from IPython.display import display
import IPython.core.display

import os
os.chdir("..")

### Fetching everything

In [10]:
from smart_functions import read_fasta_to_dseqrecords

top_promoters = r'notebooks/data/insert_sequnces/promoters.fasta'
top_terminators = r'notebooks/data/insert_sequnces/terminators.fasta'

hr_fa= r'notebooks/data/insert_sequnces/HR.fasta'
m_paba_fa = r'notebooks/data/insert_sequnces/PABA.fasta'
m_hygro_fa = r'notebooks/data/insert_sequnces/Hygromycin.fasta' #hygromicin including promoter and terminator.
cds_fa = r'notebooks/data/insert_sequnces/pufa_optimized.fasta' #pfa123 and pptase

promoters, promoter_names = read_fasta_to_dseqrecords(top_promoters)
terminators, terminator_names = read_fasta_to_dseqrecords(top_terminators)

cds_records, cds_names = read_fasta_to_dseqrecords(cds_fa)
m_paba, m_paba_names = read_fasta_to_dseqrecords(m_paba_fa)
m_hygro, m_hygro_names = read_fasta_to_dseqrecords(m_hygro_fa)
hr_records, HR_names = read_fasta_to_dseqrecords(hr_fa)


Nowadays, the FASTA file format is usually understood not to have any such comments, and most software packages do not allow them. Therefore, the use of comments at the beginning of a FASTA file is now deprecated in Biopython.


(1) Modify your FASTA file to remove such comments at the beginning of the file.

(2) Use SeqIO.parse with the 'fasta-pearson' format instead of 'fasta'. This format is consistent with the FASTA format defined by William Pearson's FASTA aligner software. Thie format allows for comments before the first sequence; lines starting with the ';' character anywhere in the file are also regarded as comment lines and are ignored.

(3) Use the 'fasta-blast' format. This format regards any lines starting with '!', '#', or ';' as comment lines. The 'fasta-blast' format may be safer than the 'fasta-pearson' format, as it explicitly indicates which lines are comments. 


#### A quick count of sequences

In [11]:
print(f"Promoters: {len(promoters)}, CDS: {len(cds_records)}, Terminators: {len(terminators)}")

Promoters: 9, CDS: 3, Terminators: 9


### Putting it all together in a list

In [12]:
full_constructs = []

for p in promoters:
    for t in terminators:
        
        # Build the entire block of all CDSs wrapped in p and t
        cds_block = []
        for cds in cds_records:
            cds_block.extend([p, cds, t])

        # Now wrap with HR and PABA components
        construct = [
            hr_records[0],
            m_paba[0],
            *cds_block,
            m_paba[1],
            hr_records[1]
        ]
        
        # Join into final construct string
        full_constructs.append("-".join(construct))

print(full_constructs)
print(len(full_constructs), "constructs generated")

IndexError: list index out of range

### Combining

In [None]:
TARGET_TM = 55
LIMIT = 13
OVERLAP = 35


design = DesignAssembly(full_constructs, list_of_pads=[], positions_of_pads=[], target_tm=TARGET_TM, limit=LIMIT, overlap=OVERLAP)
variants_df = design.show_variants_lib_df()          
primers_df  = design.primer_list_to_dataframe()      
pcrs_df = design.pcr_list_to_dataframe()


out_dir = os.path.abspath(".")
variants_csv = os.path.join(out_dir, "variants_library.csv")
primers_csv  = os.path.join(out_dir, "primers_list.csv")
pcrs_csv= os.path.join(out_dir, "pcr_plan.csv")

variants_df.to_csv(variants_csv, index=False)
primers_df.to_csv(primers_csv, index=False)
pcrs_df.to_csv(pcrs_csv, index=False)

print(f"Variants: {len(variants_df)} saved -> {variants_csv}")
print(f"Primers:  {len(primers_df)} saved -> {primers_csv}")
print(f"PCRs:     {len(pcrs_df)} saved -> {pcrs_csv}")
display(variants_df.head())
display(primers_df.head())
display(pcrs_df.head())