<h1> Promoter Designer (ProD) Tool </h1>

[Link to Article](.)       Van Brempt Maarten, Clauwaert Jim et al.

The ProD tool is designed for the construction of promoter strength libraries in prokaryotes. This [Jupyter Notebook](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/) is created to feature an interactive environment in [Python](https://docs.python.org/3/) for constructing libraries in *E. coli*. The tool can be run locally and is available through [GitHub](https://github.com/jdcla/ProD).

---



Jupyter Notebook
---

1. To use the tool in an Jupyter notebook environment, it is important to run code cells (blocks preceded by `[]:`) sequentially. To run a cell, select it and press `Ctrl+Enter`.

2. Comments in code cells are preceded by `#` and are used to offer an explanation to the code's functionality

3. To download the model's output predictions, go to the dashboard (clicking the jupyter logo in the top left corner) and download the output file (default: `my_predictions.csv`)

![dashboard](img/dashboard.png)

4. When running this notebook through [Binder](https://mybinder.org/), changes are not saved through sessions. Make sure to download all generated files. In case of malfunction or unwanted changes, simply start a new session.

---

ProD
---

The Promoter Designer tool is created to construct promoter libraries, further exploiting biological capabilities of the microorganisms that allow for the fine-tuning of genetic circuits. A neural network has been trained on hundreds of thousands of sequences that have been randomized in the **17nt spacer sequence**. Therefore, generated promoters, ranging from no expression (strengh: `0`) to high expression (strength: `10`) all feature the same UP-region, binding boxes (-35, -10) and untranslated region (UTR).

`
[UP-region][-35-box][spacer][-10-box][ATATTC][UTR]
`

`
[GGTCTATGAGTGGTTGCTGGATAAC][TTTACG][NNNNNNNNNNNNNNNNN][TATAAT][ATATTC][AGGGAGAGCACAACGGTTTCCCTCTACAAATAATTTTGTTTAACTTT]
`

The tool is run by calling the function `run_tool`, present in the `ProD.py` script. After import (first code cell). The tool can be run and has several inputs.

`
run_tool(input_data, output_path='my_predictions', lib=True,
    lib_size=5, strengths=range(0, 11), cuda=False)
`
#### **Function arguments**

`input_data (list[str])` : A list containing input samples. All input sequences require to be strings of **length 17**. Sequences can be constructed using [**A, C, G, T, R, Y, S, W, K, M, B, D, H, V, N**](https://en.wikipedia.org/wiki/Nucleic_acid_notation). When constructing a library (`lib=True`), only the first sequence is used as the input blueprint (see `Constructing a Library`)

`output_path (string)` (default: my_predictions) : A string featuring the output file. This files contains all information generated when running the tool. It furthermore contains the strength probability scores for each of the classes.


`lib (bool)` (default:True) : Determines the construction of a library (`True`) or the prediction of promoter strength of the input sequences (`False`).

`cuda (bool)` (default:False) : Determines the use of GPU accelerated computing. Does not work using Binder, requires local installation.

##### **Only evaluated for `lib=True`**

`lib_size (int)` (default:5) : The amount of output spacer sequences for each of the requested promoter strengths 

`strengths (list[int])` (default:[0,1,2,3,4,5,6,7,8,9,10]) : A list containing integers determining the promoter strengths present in te library

Read more about [Python](https://docs.python.org/3/) and [Jupyter Notebook](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/)

---
<h2> Load Code </h2>

In [None]:
# Ctrl + Enter to run

# Load code from python file
from ProD import run_tool

---
## Create Library

To create a custom promoter library, a single input blueprint is given that functions as the source from which spacer sequences are evaluated. The tool will run through the following steps:

1. Create all possible sequences from the degenerate input sequence
2. Determine the promoter strengths, retain all spacer sequences for requested promoter strengths
3. Sample promoters to construct library.
4. Construct degenerate sequence (library blueprint) from all sequences. For each blueprint, the fraction of sequences classified to each category of strength is given.

If the amount of sequences possible from the input sequence exceeds 500,000, spacers will be sampled (100,000) instead and no library blueprint is created. To attain feasible processing times, a minimal amount of user guidance in the construction of the library blueprint is required.

**NOTE:** Promoter strength is divided in 11 ordinal classes ranging from 0 to 10. Overlap between neighbouring class strengths is expected. Therefore, when constructing a library it can be beneficial to group classes together. Specifically, we recommend the following interpretation of four sets of input strengths.
* zero to low expression: `strengths = [0,1,2]`
* low to medium expression: `strengths = [3,4,5]`
* medium to high expression: `strengths = [6,7,8]`
* high to very high expression: `strengths = [9,10]`

In [None]:
# Ctrl + Enter to run

# Define custom spacer (requires to be length 17)
input_data = [
# Add single blueprint
    'NNNCGGGNCCNGGGNNN',
]
# Define strengths
my_strengths = [9,10]
# Run tool
run_tool(input_data, strengths=my_strengths)

##### Outputs can be downloaded:  Go to the dashboard (clicking the jupyter logo in the top left corner) and download the output file (default: `my_predictions.csv`)

---
<h2> Evaluate Custom Spacers</h2>

It is possible to evaluate custom sequences. The input can be given as a list or the path to a fasta file.

In [None]:
# Ctrl + Enter to run

# Define custom spacers (requires to be length 17)
input_data = [
    'TTNCCGGGCCGRRGAGA',
    'AANCCGNNNNCRRGAGA',
    'GGCCNAANANACVVVAG'
# Add extra lines if necessary
]
# Run tool
run_tool(input_data, lib=False)

---
<h3> Input Fasta File  </h3>

1. Go to **dashboard** ![dashboard](img/dashboard.png)
2. Go to **upload** ![upload](img/upload.png)
3. Input **file name**

In [None]:
# Ctrl + Enter to run

# Input fasta file location
input_file = "ex_seqs.fa"
# Run tool
run_tool(input_file, lib=False)

---