## `Non-Repetitive Parts Calculator` in Action!

**Authors** Ayaan Hossain and Howard Salis

**Updated** July 27, 2020

Click on the title **`Non-Repetitive` ... Action!** above and press `Shift`+`Enter` together on each cell to follow along and see all the code in action.

This `jupyter` notebook will demonstrate the usage of `NRP Calculator` and describe some of its [API](https://github.com/ayaanhossain/nrpcalc/blob/master/docs/DOCS.md) in designing a few commonly used genetic part types, albeit in a hypothetical setting -- real design may include more realistic choices and objectives. The purpose here is deliberately only to demonstrate the different features of `NRP Calculator` and the considerations involved in designing thousands of non-repetitive genetic parts. `NRP Calculator` is however not limited to designing only the type of parts discussed here; the possibilities are quite open-ended.

### Notebook Setup

If you [installed](https://github.com/ayaanhossain/nrpcalc#Installation) `nrpcalc` successfully, you have everything you need to follow along.

Let's first import `nrpcalc`.

In [1]:
import nrpcalc

Hopefully, the import worked without throwing up any errors! If you face issues importing, please [open an issue](https://github.com/ayaanhossain/nrpcalc/issues). If everything worked fine, you're ready to follow along. If you do not understand a specific part of this notebook, either open an issue, or please reach the authors via Twitter or Email. We would be happy to answer your questions, and update this notebook in response to your questions, comments or concerns.

In [2]:
print(nrpcalc.__doc__) # show docs


Non-Repetitive Parts Calculator

Automated design and discovery of non-repetitive genetic
parts for engineering stable systems.

Version: 1.2.6

Authors: Ayaan Hossain <auh57@psu.edu>
         Howard Salis  <salis@psu.edu>

NRP Calculator offers two modes of operation:

- Finder Mode: Discover toolboxes of non-repetitive parts
               from a list of candidate parts

-  Maker Mode: Design toolboxes of non-repetitive parts
               based on sequence, structure and model
               constraints

Additionally, a 'background' object is available which can
be used to store background sequences against which parts
discovered or designed are ensured to be non-repetitive.

You can learn more about the two modes and background via
  print(nrpcalc.background.__doc__)
  print(nrpcalc.finder.__doc__)
  print(nrpcalc.maker.__doc__)



In [3]:
import time # time-keeping is important!

### Constraint Based Design of Genetic Parts

Genetic parts exhibit their activity through biophysical interactions that rely on **DNA** or **RNA** sequence motifs, the presence or absence of specific RNA structures, and/or higher-order sequence or structural characteristics. For example, a constitutive σ<sup>70</sup> _E. coli_ promoter sequence will have a high transcription initiation rate when it contains a conserved $-35$ and $-10$ hexamer, separated by a $17$ base pair spacer. Likewise, a bacterial transcriptional terminator will have a high efficiency when it contains a fast-folding, stable RNA hairpin, followed by a U-rich tract.

Such essential characteristics can be flexibly distilled into a set of criteria that every generated genetic part sequence must satisfy. `NRP Calculator` `Maker Mode` accepts three types of genetic part constraints: a degenerate DNA or RNA sequence using the **IUPAC code**; an essential RNA secondary structure using _dot-parenthesis-x_ notation; and a model-based constraint that can be customized to quantify the presence of higher-order interactions or to facilitate the synthesis and construction of the genetic system by excluding sequences (e.g. restriction sites and polymeric sequences). All three constraints may be simultaneously used to specify a set of genetic part sequences with desired functionalities.

As examples, **Supplementary Table 2** of our [publication](https://static-content.springer.com/esm/art%3A10.1038%2Fs41587-020-0584-2/MediaObjects/41587_2020_584_MOESM1_ESM.pdf) (see page 30) lists the design constraints and algorithm outcomes for a wide variety of genetic parts commonly used in synthetic biology, including minimal Pol II promoters, insulated ribosome binding sites, prokaryotic transcriptional terminators, and toehold RNA switches.

In one sense, these genetic part constraints are explicit hypotheses that distill one’s knowledge of gene regulation and biophysics into the simplest possible computable form. In another sense, they are a type of classifier that separates the genetic part sequence space into only two categories: sequences expected to have some amount of genetic part activity versus sequences expected to have minimal to none activity.

> **Note** The constraints may not always be a quantitative prediction of functional activity; experimental characterization is still needed to validate designed parts. In general, it is advantageous to incorporate as much degeneracy into the constraints as possible to design larger toolboxes.

### Non-Repetitive σ<sup>70</sup> Promoters with CRISPRi with `Lmax=12`

We will first design $1000$ brand new promoters for constitutive transcription in prokaryotes, divided into two toolboxes. We want the first toolbox to have $500$ strong promoters, while for the second toolbox we want to design $500$ promoters with variable strength. Additionally, we want these promoters to be CRISPRi repressible for engineering system logic (see [Reis et.al. (2019)](https://www.nature.com/articles/s41587-019-0286-9)). Importantly, we will use the findings from [Larson et.al. (2013)](https://www.nature.com/articles/nprot.2013.132) to design our CRISPRi.

#### Designing the First Toolbox

The sequence constraint for the first toolbox is defined for strong constitutive transcription (consensus $-35$ and $-10$ hexamers, and an optimal spacing of $17$ bases separating the two). Additionally, a **PAM** is embedded in the $17$-bp spacer to repress transcription initiation via CRISPRi. To enhance initiation, we will also embed a G+C-rich motif into the insulating $20$-bp upstream region of $-35$ hexamer.

In [4]:
tb1_seq_constraint = ('S'*5    + # G+C-rich motif in Upstream (5 Bases)
                      'N'*15   + # Remaining 15 Bases in Upstream
                      'TTGACA' + # Consensus -35 Hexamer
                      'N'*6    + # First 6 Bases of 17-bp Spacer
                      'CCN'    + # PAM (Next 3 Bases of 17-bp Spacer)
                      'N'*8    + # Remaining 8 Bases of 17-bp Spacer
                      'TATAAT' + # Consensus -10 Hexamer
                      'N'*6)     # Discriminator

Let's review our sequence constraint.

In [5]:
print(tb1_seq_constraint)
print(' '*35 + '-'*20)
print(' '*35 + '{:^20}'.format('sgRNA Target Site'))

SSSSSNNNNNNNNNNNNNNNTTGACANNNNNNCCNNNNNNNNNTATAATNNNNNN
                                   --------------------
                                    sgRNA Target Site  


For promoters, we don't really have any DNA or RNA secondary structure constraint, so it can be all dots, that is, we don't care about the secondary structure anywhere along the sequence (this will change for downstream part design, as we'll see).

In [6]:
tb1_struct_constraint = '.'*len(tb1_seq_constraint)

In [7]:
tb1_struct_constraint

'.......................................................'

Once the design constraints are finalized, it is time to think about the experimental objectives as well. One possible objective might involve eliminating restriction sites from our promoters, so that we may clone them in successfully.

To do that, we can define some _model functions_ to help us generate promoters that are compatible for our cloning purposes. Importantly, while we can define functions to explicitly prevent our used cutsites (say, BamHI or XbaI, specifically), it would be better to prevent the occurence of any palindromic hexamer in our parts, which is usually a property of many restriction sites. That way, our promoters can be cloned and used in a variety of scenerios, without us having to restrict ourselves to using only the ones in which the specific restriction site motifs are absent, for a given cloning workflow.

> **Note** `NRP Calculator` can optimize two types of functions for us - a **local model function** which is applied on a genetic part concurrently with addition of each nucleotide, and a **global model function** which is applied on a genetic part when it is fully constructed. The local model function must accept a single argument called `seq` (the partial sequence under construction) and returns either `(True, None)` if an evaluation was satisfied, or `(False, index)` where `index` is a traceback location where nucleotide choices need to be altered to fulfill an objective. The global model function also must accept a single input `seq` (a fully constructed sequence), and return either `True` if an objective was met, or `False` otherwise.

We will now develop a new objective function to prevent palindromic hexamers in our designed promoters to be evaluated concurrently as each new base is added to a promoter towards completion (a local model function). Our function will start evaluation when the **sixth** base (base at `index=5` or equivalently, when `len(seq)=6`) is added to a partial promoter under construction (necessary condition for evaluation), and guide the design process to steer clear of palindromic hexamers. We will also define any helper functions we need, and test our logic as we move forward.

In [8]:
comptab = str.maketrans('ATGC', 'TACG') # our string translation table for complementng strings

In [9]:
# helper function
def revcomp(seq):
    # reverse string, then return its complement
    return seq[::-1].translate(comptab)

In [10]:
assert revcomp('AAATTTGGGCCC') == 'GGGCCCAAATTT' # a quick test for revcomp function shows that it works

In [11]:
assert revcomp('GGGAAATTTCCC') == 'GGGAAATTTCCC' # another quick check on a palindrome confirms definition

In [12]:
# actual local model function
def prevent_cutsite(seq):
    # is our partial sequence long enough for evaluation?
    if len(seq) >= 6:
        # extract current (last-known) 6-mer for evaluation
        hexamer = seq[-6:]
        if hexamer == revcomp(hexamer): # a palindrome!
            state = False      # objective failed
            index = len(seq)-6 # we need to go back 6 bases to alter current hexamer
            return (state, index)
        else:
            return (True, None) # sequence is free of palindromic hexamers
    # otherwise, pass for shorter sequences
    else:
        return (True, None)

The optimization done by this function is straight-forward. We check if a partial sequence under construction ends with a palindromic hexamer. If it does, then we ask `Maker` to go back $6$ bases from our current index which is at index `len(seq)-6`, and start making alternate nucleotide choices starting **at** that location. If this function returns `True` for all locations starting at the sixth base, then naturally the complete part would be devoid of palindromic hexamers. Note also that we also return `True` for partial sequences shorter than $6$ bases.

`Maker` takes care of calling this function above with the addition of each base, when it is passed as the designated local model function. So, all we ever need to do inside this function is just evaluate the "current" case, i.e., evaluate the hexamer ending at the current index to ensure our optimization.

> **Note** The traceback index should point to the location _starting at which_ nucleotide choices need to altered. For example, in the illustrated `prevent_cutsite` function above, we need to go back $6$ bases from the current location if it is palindromic, and trace a new path through constraint space. This means, our traceback index should be `len(seq)-6=0` (the very beginning of our sequence) if our current sequence is of length $6$, and forms a palindromic hexamer.

In [13]:
assert prevent_cutsite('GGATCC') == (False, 0) # BamHI will be prevented

In [14]:
assert prevent_cutsite('TCTAGA') == (False, 0) # XbaI will be prevented

In [15]:
assert prevent_cutsite('GGGGGG') == (True, None)  # non-palindromic hexamer will pass our filter

At this point, we are ready to launch `NRP Calculator` `Maker Mode` to design some promoters for us!

In [16]:
# Record starting time
t0 = time.time()

# Execute Maker
toolbox1 = nrpcalc.maker(
    seed=1,                              # reproducible results
    seq_constr=tb1_seq_constraint,       # as defined above
    struct_constr=tb1_struct_constraint, # as defined above
    Lmax=12,                             # as stated in our goal
    target_size=500,                     # as stated in our goal
    part_type='DNA',                     # as stated in our goal
    local_model_fn=prevent_cutsite)      # as defined above

# Compute execution time
tf = time.time() - t0


[Non-Repetitive Parts Calculator - Maker Mode]

[Checking Constraints]
  Sequence Constraint: SSSSSNNNNNNNNNNNNNNNTTGACANNNNNNCCNNNNNNNNNTATAATNNNNNN
 Structure Constraint: .......................................................
    Target Size      : 500 parts
           Lmax      : 12 bp
  Internal Repeats   : False

 Check Status: PASS

[Checking Arguments]
   Part Type : DNA
 Struct Type : mfe
  Synth Opt  : False
   Jump Count: 10
   Fail Count: 1000
 Output File : None

 Check Status: PASS

Constructing Toolbox:

 [part] 1, [13-mers] 43, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 2, [13-mers] 86, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 3, [13-mers] 129, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 4, [13-mers] 172, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 5, [13-mers] 215, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 6, [13-mers] 258, [iter time] 0.00s, [avg time] 0.00s, [to

 [part] 142, [13-mers] 6106, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 143, [13-mers] 6149, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 144, [13-mers] 6192, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 145, [13-mers] 6235, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 146, [13-mers] 6278, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 147, [13-mers] 6321, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 148, [13-mers] 6364, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 149, [13-mers] 6407, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 150, [13-mers] 6450, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 151, [13-mers] 6493, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 152, [13-mers] 6536, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 153, [13-mers] 6579, [iter time] 0.00s, [avg time] 0.00s,

 [part] 298, [13-mers] 12814, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 299, [13-mers] 12857, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 300, [13-mers] 12900, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 301, [13-mers] 12943, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 302, [13-mers] 12986, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 303, [13-mers] 13029, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 304, [13-mers] 13072, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 305, [13-mers] 13115, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 306, [13-mers] 13158, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 307, [13-mers] 13201, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 308, [13-mers] 13244, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 309, [13-mers] 13287, [iter time] 0.00s, [avg 

 [part] 462, [13-mers] 19866, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 463, [13-mers] 19909, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 464, [13-mers] 19952, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 465, [13-mers] 19995, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 466, [13-mers] 20038, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 467, [13-mers] 20081, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 468, [13-mers] 20124, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 469, [13-mers] 20167, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 470, [13-mers] 20210, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 471, [13-mers] 20253, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 472, [13-mers] 20296, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 473, [13-mers] 20339, [iter time] 0.00s, [avg 

In [17]:
print('Run took {:.2f}s'.format(tf)) # Show run time

Run took 0.67s


The console output shows that all our constraints passed initial checks, and were valid. `Maker` was then apparently able to design $500$ promoters for us based on all given constraints without failure in less than a second. Let's look at some of the parts produced, and compare it with our sequence constraint.

In [18]:
toolbox1[0] # show the first part designed

'CGGGCATTACTCCATTTGTCTTGACACTACACCCTACCTAGCCTATAATCCATGT'

In [19]:
tb1_seq_constraint # show the sequence constraint

'SSSSSNNNNNNNNNNNNNNNTTGACANNNNNNCCNNNNNNNNNTATAATNNNNNN'

In [20]:
toolbox1[499] # show the last part designed

'CCCGGACATCTCTTTCAATTTTGACACAACACCCCAACTGTAGTATAATAGTCGC'

The construction looks good, but it's always a good idea to verify explicitly if our design objectives were met. We will define a verification function to check if our completely constructed promoters are indeed devoid of palindromic hexamers.

In [21]:
def final_cutsite_check(seq):
    # grab all hexamers from the sequence
    hexamers = [seq[i:i+6] for i in range(len(seq)-6+1)]
    # if any of the hexamers are panlindromic, our
    # design objective was clearly not met!
    for hexamer in hexamers:
        if hexamer == revcomp(hexamer):
            return False # we will reject this part
    # None of the hexamers were palindromic!
    return True  # we will accept this part

In [22]:
# We will loop through the toolbox, and ensure
# all our parts pass the new global check
for promoter in toolbox1.values():
    assert final_cutsite_check(promoter) == True

Every promoter seems to pass our verification, so our design objective for the first toolbox was met. As we will see in the next section, the evaluation function `final_cutsite_check` could have been specified to `Maker` directly via `global_model_fn` parameter, which would automatically execute the evaluation on a part, after it was completely designed, and accept/reject it accordingly.

The benefit of passing this check as a global model function to `Maker` is that the algorithm can adjust the number of trials it needs depending on an auto-estimated probability of evaluation failure.

#### Designing the Second Toolbox - First Attempt

For our second toolbox of promoters, we want to design $500$ variable strength promoters. Our sequence constraint will change accordingly.

In [23]:
tb2_seq_constraint = 'N'*20 + 'TTGNNN' + 'N'*6 + 'CCN' + 'N'*6 + 'WW' + 'WWWWWW' + 'N'*6
#                     -----    ------     -----   ---     ----------     ------     ----
#                     UPS      -35        SPACER  PAM     SPACER         -10        DIS

Notice, we introduced degeneracy in the $-35$ and opted for just weak bases (A/T)  in place of the $-10$ hexamer. Additionally, the $-10$ is also preceded by weak bases to potentially design promoters with various spacer region lengths ranging from $15$ to $17$ bp. We still retain the PAM in spacer for CRISPRi.

In [24]:
tb2_seq_constraint # review the constraint

'NNNNNNNNNNNNNNNNNNNNTTGNNNNNNNNNCCNNNNNNNWWWWWWWWNNNNNN'

Because things are more degenerate in our present sequence constraint, we might be interested in preventing cryptic hexamers within our promoters.

This is easily done with another local model function, that identifies if a hexamer elsewhere within our promoter under construction, has fewer mismatches when compared to the consensus motifs than the ones placed (by `Maker`) at the intended `-35` and `-10` locations.

In [25]:
# helper function 1
def hamming(x, y): # score mismatches between two strings
    return sum(x[i] != y[i] for i in range(min(len(x), len(y))))

In [26]:
assert hamming(x='000000', y='111111') == 6 # test case 1

In [27]:
assert hamming(x='000111', y='111111') == 3 # test case 2

In [28]:
# helper function 2
def cryptic_hexamer(cx, hx, dt): # returns True if hx is a cryptic hexamer
    '''
    cx = consensus motif for either -35 or -10
    hx = current hexamer under evaluation
    dt = number of mismatches between cx and current
         motif placed at -35 or -10
    '''
    # if current hexamer (hx) is closer to consensus
    # motif (cm) than the actual selected motif used
    # at the intended location (dt), then we have a
    # cryptic promoter under construction (True)
    if hamming(cx, hx) < dt:
        return True
    return False

In [29]:
assert cryptic_hexamer(cx='TTGACA', hx='AAAAAA', dt=3) == False # test case 1

In [30]:
assert cryptic_hexamer(cx='TTGACA', hx='TTGAGA', dt=3) == True  # test case 2

In [31]:
# actual local model function
def prevent_cryptic_promoter(seq, c35start, c10start, eval_index=None):
    '''
    seq - partial sequence under construction to be evaluated
    c35start - starting index of -35 hexamer (python indexing)
    c10start - starting index of -10 hexamer (python indexing)
    eval_index - the location ending at which a hexamer is to
                 be evaluated (default=None implies use the
                 last hexamer in current seq, i.e. ending at
                 len(seq))
    '''
    c35 = 'TTGACA' # defined -35 consensus
    c10 = 'TATAAT' # defined -10 consensus
    # sequence long enough to evaluate
    if len(seq) >= 6:
        # current index?
        end = len(seq)
        
        # which hexamer to evaluate?
        if eval_index is None: # no eval_index provided
            eval_index = end   # use the hexamer ending at current index (end)
        # otherwise use appropriate eval_index provided
        
        # extract current / appropriate hexamer
        hx = seq[eval_index-6:eval_index]
        
        # Case: -35 hexamer
        
        # current hexamer is at -35 or -10 location?
        # then skip evaluation for extracted hexamer
        if end == c35start+6:
            return (True, None) # skip -35 location
        if end == c10start+6:
            return (True, None) # skip -10 location
        
        # extract current -35 hexamer
        # if there is one
        s35 = None
        if end > c35start+6: # a -35 motif has been placed
            s35 = seq[c35start:c35start+6]
        
        # set -35 hexamer cutoff
        if s35 is None: # no -35 hexamer present yet
            d35 = 3 # default distance to prevent
        else:
            d35 = hamming(s35, c35) # actual distance to prevent
        
        # evaluate hx for -35 hexamer
        if cryptic_hexamer(cx=c35, hx=hx, dt=d35):
            return (False, end-6) # our current hexamer is a cryptic -35;
                                  # go back 6 bases
        
        # Case: -10 hexamer
        
        # extract current -10 hexamer
        # if there is one
        s10 = None
        if end > c10start+6:
            s10 = seq[c10start:c10start+6]
        
        # set -10 hexamer cutoff
        if s10 is None: # no -10 hexamer present yet
            d10 = 3 # default distance to prevent
        else:
            d10 = hamming(s10, c10) # actual distance to prevent
        
        # evaluate hx for -35 hexamer
        if cryptic_hexamer(cx=c10, hx=hx, dt=d10):
            return (False, end-6) # our current hexamer is a cryptic -10;
                                  # go back 6 bases
        
        # both -35 and -10 checks passed
        return (True, None) # part is OK
    
    # not long enough to evaluate
    return (True, None) # part is OK .. so far

In [32]:
# test case 1 - a partial sequence with last 6 bases very similar to the -35 consensus
assert prevent_cryptic_promoter(seq='GGGGGGGGTTGACT', c35start=20, c10start=20+6+17) == (False, 8)

In [33]:
# test case 2 - a partial sequence with last 6 bases very similar to the -10 consensus
assert prevent_cryptic_promoter(seq='GGGGGGGTATAGT', c35start=20, c10start=20+6+17) == (False, 7)

In [34]:
# test case 3 - a partial sequence with last 6 bases dissimilar to the both motifs
assert prevent_cryptic_promoter(seq='GGGGGGGGAAGATC', c35start=20, c10start=20+6+17) == (True, None)

The above local model function `prevent_cryptic_promoter` utilizes many smaller functions in order to make it's evaluation, which is fine. The only thing we need to take care of in order to use this function with `Maker` is the setting of `c35start` and `c10start` parameters, which are required for the function to work (note `eval_index` has a default value of `None` so we need not worry about it right now), given that `Maker Mode` only works with local model functions that just takes in a single input - the partial sequence under construction (`seq`).

Some obvious solutions would be to hardcode all parameters apart from `seq` inside the local model function, or give them default values like we did for `eval_index`, but we don't really need to do either. Instead, we have two more options.

The first option is to define a lambda function to wrap the above function like so.

In [35]:
prevent_cryptic = lambda seq: prevent_cryptic_promoter(seq=seq, c35start=20, c10start=43)

We could then set `local_model_fn=prevent_cryptic`, and our optimization would come through. This wrapping leaves the underlying function `prevent_cryptic_promoter` free for more general use later. For example, it could be used to power an additional global model function that checks the completely constructed part for cryptic hexamers by re-evaluating the hexamers at each index starting at $5$ via the `eval_index` parameter (more on this a little later).

The second option is to define a wrapper function explicitly.

In [36]:
def prevent_cryptic(seq):
    return prevent_cryptic_promoter(
        seq=seq,
        c35start=20,
        c10start=43)

Notice, that now we actually have two local model functions so far: (1) the `prevent_cryptic` function and (2) the previously used `prevent_cutsite` function used for the first toolbox. In such multi-objective design scenerios, we would have to write a meta local model function, which would run these individual local model functions along with any specific parameters. Here's an example:

In [37]:
# a meta local model function
def variable_promoter_local(seq):
    # check the first objective
    outcome,index = prevent_cutsite(seq)
    # return traceback index if objective fails
    if outcome == False:
        return False,index
    # check the second objective
    outcome,index = prevent_cryptic_promoter(
        seq=seq,
        c35start=20,
        c10start=43)
    # return traceback index if objective fails
    if outcome == False:
        return False,index
    # every objective met .. good to go!
    return (True, None)

> **Note** It is important to be careful about string indexing when the function logic becomes moderately complex. For example, in `Python` strings are $0$-indexed, which means that the $-35$ hexamer starts after the first $20$ bases in the upstream region at index $20$ (i.e. the $21$st base belongs to $-35$). It is also important to note when a model function should be evaluated. For example, if the evaluation logic requires at least an $8$-bp sequence, parts shorter than that length should be evaluated `True` to let the sequence grow long enough for evaluation.

> **Note** When there are multiple local objectives at play that evaluate properties of the partial sequence at an upstream location, it is often advantageous to return the traceback index that occurs earlier in the sequence position. For example, one thing we could do in a meta local model function is evaluate two objectives, and return `(False, min(index1, index2))` if both objective functions failed with different traceback locations, or just `(False, index1)` if only the first one failed and so on. We could also weigh the various objectives differently, and choose to return the most important traceback index first, rather than the second most important traceback index etc.

> **Note** It is possible to embed external models as evaluators into `NRP Calculator`. For example, rather than only preventing cryptic motifs as shown above, we could have also used a `scikit-learn` `Lasso` model, as described in [our publication](https://www.nature.com/articles/s41587-020-0584-2), to design promoters within specific dynamic ranges. We would load the model (unpickle) into memory, and for every promoter completely constructed by `Maker`, we would evaluate the predicted initation rate and only accept parts that satisfied our criteria (a global model function). Alternatively, we could further identify, using the model, which of the components (hexamers, spacer GC etc.) prevented a part from being accepted, and returned a traceback index accordingly (i.e. converted the global into a local model function) to explore nucleotide choices concurrently with part design.

> **Note** It is always a good idea to test the individual functions called by a meta local or global model function, using simple cases. Notice, how we have `assert`-ed a few test cases for the helper functions above.

> **Note** If `Maker` takes an impossible amount of time to create a single part in the prescence of a model function, it is worth investigating if the model function in context gets stuck in an infinite loop or an edge case. A quick check would be to run `Maker` with all constraints except the model function. If `Maker` is able to design parts quickly in the absence of the model function, then the slow-down is naturally due to the the model function itself, which should be investigated and optimized.

Now that our meta local model function is ready, we can define a meta global model function that calls `final_cutsite_check` as well as `prevent_cryptic_promoter` like so.

In [38]:
# a meta global model function
def variable_promoter_global(seq):
    # check for cutsites post construction
    if not final_cutsite_check(seq):
        return False # cutsites found!
    
    # note: the following block could be
    # its own function, and called by this
    # meta global function
    
    # check for cyptic hexamers
    # starting at the 6th base
    for eval_index in range(6, len(seq)):
        # use the generalized evaluation function
        state, index = prevent_cryptic_promoter(
            seq=seq,
            c35start=20,
            c10start=43,
            eval_index=eval_index)
        # there is a cryptic hexamer ending
        # at the current location
        if state is False:
            return False
    
    # all checks passed!
    return True

With all our evaluators completed, we're ready to design our second toolbox of promoters. Let's call upon `Maker` to do our bidding.

In [39]:
# Record starting time
t0 = time.time()

# Execute Maker
toolbox2_attempt1 = nrpcalc.maker(
    seed=2,                                    # reproducible results
    seq_constr=tb2_seq_constraint,             # as defined above
    struct_constr=tb1_struct_constraint,       # same as toolbox1
    Lmax=12,                                   # as stated in our goal
    target_size=500,                           # as stated in our goal
    part_type='DNA',                           # as stated in our goal
    local_model_fn=variable_promoter_local,    # as defined above
    global_model_fn=variable_promoter_global)  # as defined above

# Compute execution time
tf = time.time() - t0


[Non-Repetitive Parts Calculator - Maker Mode]

[Checking Constraints]
  Sequence Constraint: NNNNNNNNNNNNNNNNNNNNTTGNNNNNNNNNCCNNNNNNNWWWWWWWWNNNNNN
 Structure Constraint: .......................................................
    Target Size      : 500 parts
           Lmax      : 12 bp
  Internal Repeats   : False

 Check Status: PASS

[Checking Arguments]
   Part Type : DNA
 Struct Type : mfe
  Synth Opt  : False
   Jump Count: 10
   Fail Count: 1000
 Output File : None

 Check Status: PASS

Constructing Toolbox:

 [part] 1, [13-mers] 43, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 2, [13-mers] 86, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 3, [13-mers] 129, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 4, [13-mers] 172, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 5, [13-mers] 215, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 6, [13-mers] 258, [iter time] 0.01s, [avg time] 0.01s, [to

 [part] 95, [13-mers] 4085, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 96, [13-mers] 4128, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 97, [13-mers] 4171, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 98, [13-mers] 4214, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 99, [13-mers] 4257, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 100, [13-mers] 4300, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 101, [13-mers] 4343, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 102, [13-mers] 4386, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 103, [13-mers] 4429, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 104, [13-mers] 4472, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 105, [13-mers] 4515, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 106, [13-mers] 4558, [iter time] 0.01s, [avg time] 0.01s, [tot

 [part] 215, [13-mers] 9245, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 216, [13-mers] 9288, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 217, [13-mers] 9331, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 218, [13-mers] 9374, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 219, [13-mers] 9417, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 220, [13-mers] 9460, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 221, [13-mers] 9503, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 222, [13-mers] 9546, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 223, [13-mers] 9589, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 224, [13-mers] 9632, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 225, [13-mers] 9675, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 226, [13-mers] 9718, [iter time] 0.00s, [avg time] 0.01s,

 [part] 342, [13-mers] 14706, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 343, [13-mers] 14749, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 344, [13-mers] 14792, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 345, [13-mers] 14835, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 346, [13-mers] 14878, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 347, [13-mers] 14921, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 348, [13-mers] 14964, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 349, [13-mers] 15007, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 350, [13-mers] 15050, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 351, [13-mers] 15093, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 352, [13-mers] 15136, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 353, [13-mers] 15179, [iter time] 0.01s, [avg 

 [part] 453, [13-mers] 19479, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 454, [13-mers] 19522, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 455, [13-mers] 19565, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 456, [13-mers] 19608, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 457, [13-mers] 19651, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 458, [13-mers] 19694, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 459, [13-mers] 19737, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 460, [13-mers] 19780, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 461, [13-mers] 19823, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 462, [13-mers] 19866, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 463, [13-mers] 19909, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 464, [13-mers] 19952, [iter time] 0.00s, [avg 

In [40]:
print('Run took {:.2f}s'.format(tf)) # Show run time

Run took 3.70s


Notice, that the running time increased from less than one second for the first toolbox, to about four seconds for this present toolbox. This is because the running time of `Maker` greatly depends on the complexity of the underlying model functions. For the second toolbox, we used both local and global model functions, each of which considered two sub-objectives inside them. `Maker` was able to satisfy all of these objectives and finished in under four seconds.

Let's review our newly minted toolbox!

In [41]:
toolbox2_attempt1[0] # first promoter in second toolbox

'TCACCTAGCGCAGCGGTCAGTTGTACCGGGTCCCCCGCGTTTTTTTAATTCGATT'

In [42]:
toolbox2_attempt1[499] # last promoter in second toolbox

'CGACACGAAACTACGCGCCATTGTGTTCCTACCCGAGGAACAAAAAAAAGGACGC'

#### Designing the Second Toolbox - Second Attempt

The reason we called the previous sub-section a **"First Attempt"** is because, the second toolbox we designed above is non-repetitive to itself, but not against `toolbox1` promoters designed apriori. To verify non-repetitiveness in construction, we can use `Finder Mode`.

In [43]:
# combine both toolboxes
promoters = list(toolbox1.values())
promoters.extend(toolbox2_attempt1.values())

In [44]:
# compute the number of non-repetitive promoters
non_repetitive_promoters = len(nrpcalc.finder(
    seq_list=promoters,
    Lmax=12))


[Non-Repetitive Parts Calculator - Finder Mode]

[Checking Constraints]
 Sequence List   : 1000 parts
          Lmax   : 12 bp
 Internal Repeats: False

 Check Status: PASS

[Checking Arguments]
   Vertex Cover: nrp2
   Output  File: None

 Check Status: PASS

Extracted 1000 unique sequences out of 1000 sequences in 0.0007942 seconds

Written 1000 unique sequences out to ./30e3bd00-43a8-4a78-b593-1668a6c49d59/seq_list.txt in 0.001939 seconds

 [Sequence processing remaining] = 1    
 [Cliques inserted] = 984 

Built homology graph in 0.8825 seconds. [Edges = 21] [Nodes = 1000]
 [Intital Nodes = 1000] - [Repetitive Nodes = 0] = [Final Nodes = 1000]

 [+] Initial independent set = 0, computing vertex cover on remaining 0 nodes.
 [+] Vertex Cover Function: NRP 2-approximation
 [+] Dumping graph into: ./30e3bd00-43a8-4a78-b593-1668a6c49d59/repeat_graph.txt in 0.009725570678710938 seconds

----------------------
Now running iteration: 0
----------------------

 Pendant checking is in progr

  [x] Isolated node 30 eliminated
  [x] Isolated node 991 eliminated
  [x] Isolated node 656 eliminated
  [x] Isolated node 299 eliminated
  [x] Isolated node 593 eliminated
  [x] Isolated node 236 eliminated
  [x] Isolated node 941 eliminated
  [x] Isolated node 878 eliminated
  [x] Isolated node 505 eliminated
  [x] Isolated node 559 eliminated
  [x] Isolated node 186 eliminated
  [x] Isolated node 123 eliminated
  [x] Isolated node 828 eliminated
  [x] Isolated node 471 eliminated
  [x] Isolated node 765 eliminated
  [x] Isolated node 136 eliminated
  [x] Isolated node 73 eliminated
  [x] Isolated node 778 eliminated
  [x] Isolated node 715 eliminated
  [x] Isolated node 358 eliminated
  [x] Isolated node 39 eliminated
  [x] Isolated node 984 eliminated
  [x] Isolated node 665 eliminated
  [x] Isolated node 308 eliminated
  [x] Isolated node 602 eliminated
  [x] Isolated node 245 eliminated
  [x] Isolated node 950 eliminated
  [x] Isolated node 887 eliminated
  [x] Isolated node 258


Non-Repetitive Toolbox Size: 983


In [45]:
assert non_repetitive_promoters < 1000 # we're short of our goal ... some promoters were repetitive

As we can see, the final non-repetitive toolbox when both toolboxes are combined together has less than $1000$ parts in it, which is short of our intended goal. This is where concept of **"background"** comes into play (check [DOCS](https://github.com/ayaanhossain/nrpcalc/blob/master/docs/DOCS.md) for `background` API details).

> **Note** `Finder` is an unstable algorithm by design. What this means is that, in a scenario consisting of a list of repetitive parts, the returned non-repetitive subset is approximately the largest possible non-repetitive toolbox, but this may change slightly across multiple runs for the same set of inputs. There is no way known, generally, to be certain of what the absolutely largest toolbox size actually is (unless all parts were either repetitive or non-repetitive as in the case during verification of parts returned by `Maker`), so we retained a level of stochasticity in `Finder`. This encourages us to run `Finder` (which is pretty fast in practice) several times on a candidate toolbox of parts, and then select the largest non-repetitive toolbox returned across all runs.

To create the second toolbox while ensuring it is non-repetitive to the first one, we will populate a temporary `background` object.

In [46]:
bkg = nrpcalc.background(
    path='./tmp_bkg', # we store the background on disk in the 'tmp_bkg' directory on current path
    Lmax=12)          # same Lmax as toolbox1

In [47]:
bkg # checking background path and content, we see it has zero elements

kmerSetDB stored at ./tmp_bkg/ with 0 13-mers

In [48]:
# # we could add the promoters one-by-one
# for promoter in toolbox1.values():
#     bkg.add(promoter)

# or add it all in one-shot
bkg.multiadd(toolbox1.values())


[Background Processing]


  Adding Seq 267: GCCGGGAATA...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 499: CCCGGACATC...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

In [49]:
bkg # now, background is populated with toolbox1 k-mers

kmerSetDB stored at ./tmp_bkg/ with 21500 13-mers

With our `background` populated, we are now ready to design our actual second toolbox such that it is indeed non-repetitive to the first toolbox, therefore allowing both toolboxes to be used simultaneously. This is what we refer to as **toolbox chaining**. For example, you can chain a `Maker` job against a genome inserted in `background`. You can also use `background` with `Finder` jobs to incrementally enlarge a central collection of parts from multiple sources.

In [50]:
# Record starting time
t0 = time.time()

# Execute Maker
toolbox2 = nrpcalc.maker(
    seed=3,                                    # reproducible results
    seq_constr=tb2_seq_constraint,             # as defined above
    struct_constr=tb1_struct_constraint,       # same as toolbox1
    Lmax=12,                                   # as stated in our goal
    target_size=500,                           # as stated in our goal
    part_type='DNA',                           # as stated in our goal
    local_model_fn=variable_promoter_local,    # as defined above
    global_model_fn=variable_promoter_global,  # as defined above
    background=bkg)                            # as defined above

# Compute execution time
tf = time.time() - t0


[Non-Repetitive Parts Calculator - Maker Mode]

[Checking Constraints]
  Sequence Constraint: NNNNNNNNNNNNNNNNNNNNTTGNNNNNNNNNCCNNNNNNNWWWWWWWWNNNNNN
 Structure Constraint: .......................................................
    Target Size      : 500 parts
           Lmax      : 12 bp
  Internal Repeats   : False

 Check Status: PASS

[Checking Background]
 Background: kmerSetDB stored at ./tmp_bkg/ with 21500 13-mers

 Check Status: PASS

[Checking Arguments]
   Part Type : DNA
 Struct Type : mfe
  Synth Opt  : False
   Jump Count: 10
   Fail Count: 1000
 Output File : None

 Check Status: PASS

Constructing Toolbox:

 [part] 1, [13-mers] 43, [iter time] 0.00s, [avg time] 0.00s, [total time] 0.00h
 [part] 2, [13-mers] 86, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 3, [13-mers] 129, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 4, [13-mers] 172, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 5, [13-mers] 215, [iter time] 0.0

 [part] 98, [13-mers] 4214, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 99, [13-mers] 4257, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 100, [13-mers] 4300, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 101, [13-mers] 4343, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 102, [13-mers] 4386, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 103, [13-mers] 4429, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 104, [13-mers] 4472, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 105, [13-mers] 4515, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 106, [13-mers] 4558, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 107, [13-mers] 4601, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 108, [13-mers] 4644, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 109, [13-mers] 4687, [iter time] 0.01s, [avg time] 0.01s, [

 [part] 201, [13-mers] 8643, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 202, [13-mers] 8686, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 203, [13-mers] 8729, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 204, [13-mers] 8772, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 205, [13-mers] 8815, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 206, [13-mers] 8858, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 207, [13-mers] 8901, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 208, [13-mers] 8944, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 209, [13-mers] 8987, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 210, [13-mers] 9030, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 211, [13-mers] 9073, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 212, [13-mers] 9116, [iter time] 0.00s, [avg time] 0.01s,

 [part] 307, [13-mers] 13201, [iter time] 0.03s, [avg time] 0.01s, [total time] 0.00h
 [part] 308, [13-mers] 13244, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 309, [13-mers] 13287, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 310, [13-mers] 13330, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 311, [13-mers] 13373, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 312, [13-mers] 13416, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 313, [13-mers] 13459, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 314, [13-mers] 13502, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 315, [13-mers] 13545, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 316, [13-mers] 13588, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 317, [13-mers] 13631, [iter time] 0.04s, [avg time] 0.01s, [total time] 0.00h
 [part] 318, [13-mers] 13674, [iter time] 0.00s, [avg 

 [part] 409, [13-mers] 17587, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 410, [13-mers] 17630, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 411, [13-mers] 17673, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 412, [13-mers] 17716, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 413, [13-mers] 17759, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 414, [13-mers] 17802, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 415, [13-mers] 17845, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 416, [13-mers] 17888, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 417, [13-mers] 17931, [iter time] 0.00s, [avg time] 0.01s, [total time] 0.00h
 [part] 418, [13-mers] 17974, [iter time] 0.02s, [avg time] 0.01s, [total time] 0.00h
 [part] 419, [13-mers] 18017, [iter time] 0.01s, [avg time] 0.01s, [total time] 0.00h
 [part] 420, [13-mers] 18060, [iter time] 0.01s, [avg 

In [51]:
print('Run took {:.2f}s'.format(tf)) # Show run time

Run took 4.48s


Notice, that the running time now increased from less than four seconds in our previous attempt to slightly more than four seconds in our current attempt. This is because we introduced `background` as an additional constraint for the design job above.

Let's update our `background` and use `Finder` again to verify our construction of the second toolbox.

In [52]:
bkg.multiadd(toolbox2.values()) # updated with second toolbox for building next set of parts


[Background Processing]


  Adding Seq 267: AACTACTCAA...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 499: AAGACTGCAC...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

In [53]:
# recreate the list of promoters for evaluation
promoters = list(toolbox1.values()) + list(toolbox2.values())

In [54]:
# assess non-repetitiveness
non_repetitive_promoters = len(nrpcalc.finder(
    seq_list=promoters,
    Lmax=12))


[Non-Repetitive Parts Calculator - Finder Mode]

[Checking Constraints]
 Sequence List   : 1000 parts
          Lmax   : 12 bp
 Internal Repeats: False

 Check Status: PASS

[Checking Arguments]
   Vertex Cover: nrp2
   Output  File: None

 Check Status: PASS

Extracted 1000 unique sequences out of 1000 sequences in 0.001378 seconds

Written 1000 unique sequences out to ./b886335c-8f9a-4823-8008-4f1068ac8c2c/seq_list.txt in 0.002599 seconds

 [Sequence processing remaining] = 1    
 [Cliques inserted] = 1000

Built homology graph in 0.8221 seconds. [Edges = 0] [Nodes = 1000]
 [Intital Nodes = 1000] - [Repetitive Nodes = 0] = [Final Nodes = 1000]

 [+] Initial independent set = 0, computing vertex cover on remaining 0 nodes.
 [+] Vertex Cover Function: NRP 2-approximation
 [+] Dumping graph into: ./b886335c-8f9a-4823-8008-4f1068ac8c2c/repeat_graph.txt in 0.0017783641815185547 seconds

----------------------
Now running iteration: 0
----------------------

 Pendant checking is in progre

  [x] Isolated node 111 eliminated
  [x] Isolated node 800 eliminated
  [x] Isolated node 443 eliminated
  [x] Isolated node 737 eliminated
  [x] Isolated node 380 eliminated
  [x] Isolated node 61 eliminated
  [x] Isolated node 393 eliminated
  [x] Isolated node 703 eliminated
  [x] Isolated node 330 eliminated
  [x] Isolated node 624 eliminated
  [x] Isolated node 11 eliminated
  [x] Isolated node 972 eliminated
  [x] Isolated node 653 eliminated
  [x] Isolated node 280 eliminated
  [x] Isolated node 590 eliminated
  [x] Isolated node 217 eliminated
  [x] Isolated node 922 eliminated
  [x] Isolated node 859 eliminated
  [x] Isolated node 502 eliminated
  [x] Isolated node 540 eliminated
  [x] Isolated node 183 eliminated
  [x] Isolated node 104 eliminated
  [x] Isolated node 809 eliminated
  [x] Isolated node 452 eliminated
  [x] Isolated node 746 eliminated
  [x] Isolated node 133 eliminated
  [x] Isolated node 70 eliminated
  [x] Isolated node 775 eliminated
  [x] Isolated node 402

  [x] Isolated node 591 eliminated
  [x] Isolated node 218 eliminated
  [x] Isolated node 923 eliminated
  [x] Isolated node 860 eliminated
  [x] Isolated node 503 eliminated
  [x] Isolated node 541 eliminated
  [x] Isolated node 168 eliminated
  [x] Isolated node 105 eliminated
  [x] Isolated node 810 eliminated
  [x] Isolated node 453 eliminated
  [x] Isolated node 747 eliminated
  [x] Isolated node 134 eliminated
  [x] Isolated node 71 eliminated
  [x] Isolated node 403 eliminated
  [x] Isolated node 697 eliminated
  [x] Isolated node 340 eliminated
  [x] Isolated node 634 eliminated
  [x] Isolated node 21 eliminated
  [x] Isolated node 982 eliminated
  [x] Isolated node 663 eliminated
  [x] Isolated node 290 eliminated
  [x] Isolated node 584 eliminated
  [x] Isolated node 227 eliminated
  [x] Isolated node 932 eliminated
  [x] Isolated node 869 eliminated
  [x] Isolated node 496 eliminated
  [x] Isolated node 550 eliminated
  [x] Isolated node 177 eliminated
  [x] Isolated node 11

  [x] Isolated node 282 eliminated
  [x] Isolated node 576 eliminated
  [x] Isolated node 219 eliminated
  [x] Isolated node 924 eliminated
  [x] Isolated node 861 eliminated
  [x] Isolated node 488 eliminated
  [x] Isolated node 542 eliminated
  [x] Isolated node 169 eliminated
  [x] Isolated node 106 eliminated
  [x] Isolated node 811 eliminated
  [x] Isolated node 454 eliminated
  [x] Isolated node 748 eliminated
  [x] Isolated node 135 eliminated
  [x] Isolated node 56 eliminated
  [x] Isolated node 404 eliminated
  [x] Isolated node 698 eliminated
  [x] Isolated node 341 eliminated
  [x] Isolated node 635 eliminated
  [x] Isolated node 22 eliminated
  [x] Isolated node 983 eliminated
  [x] Isolated node 648 eliminated
  [x] Isolated node 291 eliminated
  [x] Isolated node 585 eliminated
  [x] Isolated node 228 eliminated
  [x] Isolated node 933 eliminated
  [x] Isolated node 870 eliminated
  [x] Isolated node 497 eliminated
  [x] Isolated node 551 eliminated
  [x] Isolated node 17

In [55]:
assert non_repetitive_promoters == 1000 # no promoters missing!

### Non-Repetititve Ribosome Binding Sites with `Lmax=14`

To complement our designed promoter toolboxes, we will next design a toolbox of prokaryotic ribosome binding sites (RBSs). We will primarily be using findings from [Salis et.al. (2009)](https://www.nature.com/articles/nbt.1568) for designing our RBSs.

We aim to design $1000$ _de novo_ RBS sequences that are non-repetitive to our promoter toolboxes designed in the previous sections. Our RBS sequence constraint is therefore highly degenerate, containing a $26$-bp upstream region, a $4$-bp standby site, and a $9$-bp consensus Shine-Dalgarno (SD) motif ('UAAGGAGGA') separated from the start codon ('AUG') by a near-optimal $6$-bp spacer. Importantly, the coupled structure constraint mandates a small hairpin on the $5'$-end of designed sequences to insulate the RBS against quick mRNA decay, while ensuring that the Shine-Dalgarno motif and everything downstream remains unstructured.

Let's define and review our constraints.

In [56]:
tb3_seq_constraint = 'N'*26 + 'N'*4 + 'UAAGGAGGA' + 'N'*6 + 'AUG'
#                     -----    ----    ---------     ----    ---
#                    SPACER    SBS     SD Motif    SPACER  START

In [57]:
tb3_struct_constraint = '.(((((....)))))...............xxxxxxxxxxxxxxxxxx'

In [58]:
print(tb3_seq_constraint)
print(tb3_struct_constraint)

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNUAAGGAGGANNNNNNAUG
.(((((....)))))...............xxxxxxxxxxxxxxxxxx


The dots (`.`) in the structure constraint implies that the bases in the sequence constraint at the corresponding locations are free to either base-pair or not when a candidate part is generated. Bases marked with parenthesis (`(` and `)`) indicate that the folded structure must contain those designated base-pairings, for example the second base must pair with the fifteenth base and so on. Bases marked with `x` are forbidden from being part of any base pairing in the secondary `RNA` structure. This _dot-parenthesis-x_ notation is inspired from the secondary structure notation used by nucleic acid structure prediction programs such as `ViennaRNA`.

Before we design the RBS toolbox, we must note that the constraint for RBS toolbox here includes an `Lmax` of $14$, whereas, the promoters were designed with an `Lmax` of $12$ bases. This is because, there is a big $9$-bp constant Shine-Dalgarno motif in the sequence constraint which doesn't leave too many $13$-mers (recall `Lmax=12`) for constructing thousands of non-repetitive RBSs. As proof, let's try constructing the RBS toolbox with `Lmax=12`, without using any `background` developed previously and only using the fast `mfe` (minimum free energy) structure evaluation (a relaxed design scenerio).

In [59]:
# Record starting time
t0 = time.time()

# Execute Maker
toolbox3_attempt1 = nrpcalc.maker(
    seed=4,                                    # reproducible results
    seq_constr=tb3_seq_constraint,             # as defined above
    struct_constr=tb3_struct_constraint,       # as defined above
    Lmax=12,                                   # as stated in our goal
    target_size=1000,                          # as stated in our goal
    part_type='RNA',                           # as stated in our goal
    struct_type='mfe',                         # as defined above        
    local_model_fn=None,                       # as defined above
    global_model_fn=None,                      # as defined above
    background=None)                           # as defined above

# Record execution time
tf = time.time() - t0


[Non-Repetitive Parts Calculator - Maker Mode]

[Checking Constraints]
  Sequence Constraint: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNUAAGGAGGANNNNNNAUG
 Structure Constraint: .(((((....)))))...............xxxxxxxxxxxxxxxxxx
    Target Size      : 1000 parts
           Lmax      : 12 bp
  Internal Repeats   : False


 Check Status: PASS

[Checking Arguments]
   Part Type : RNA
 Struct Type : mfe
  Synth Opt  : False
   Jump Count: 10
   Fail Count: 1000
 Output File : None

 Check Status: PASS

Constructing Toolbox:

 [part] 1, [13-mers] 36, [iter time] 0.03s, [avg time] 0.03s, [total time] 0.00h
 [part] 2, [13-mers] 72, [iter time] 0.16s, [avg time] 0.10s, [total time] 0.00h
 [part] 3, [13-mers] 108, [iter time] 0.00s, [avg time] 0.07s, [total time] 0.00h
 [part] 4, [13-mers] 144, [iter time] 0.09s, [avg time] 0.07s, [total time] 0.00h
 [part] 5, [13-mers] 180, [iter time] 0.19s, [avg time] 0.10s, [total time] 0.00h
 [part] 6, [13-mers] 216, [iter time] 0.00s, [avg time] 0.08s, [total time] 0.

 [part] 91, [13-mers] 3276, [iter time] 0.16s, [avg time] 0.11s, [total time] 0.00h
 [part] 92, [13-mers] 3312, [iter time] 0.04s, [avg time] 0.11s, [total time] 0.00h
 [part] 93, [13-mers] 3348, [iter time] 0.08s, [avg time] 0.11s, [total time] 0.00h
 [part] 94, [13-mers] 3384, [iter time] 0.08s, [avg time] 0.11s, [total time] 0.00h
 [part] 95, [13-mers] 3420, [iter time] 0.06s, [avg time] 0.11s, [total time] 0.00h
 [part] 96, [13-mers] 3456, [iter time] 0.05s, [avg time] 0.11s, [total time] 0.00h
 [part] 97, [13-mers] 3492, [iter time] 0.00s, [avg time] 0.10s, [total time] 0.00h
 [part] 98, [13-mers] 3528, [iter time] 0.13s, [avg time] 0.10s, [total time] 0.00h
 [part] 99, [13-mers] 3564, [iter time] 0.19s, [avg time] 0.11s, [total time] 0.00h
 [part] 100, [13-mers] 3600, [iter time] 0.00s, [avg time] 0.10s, [total time] 0.00h
 [part] 101, [13-mers] 3636, [iter time] 0.32s, [avg time] 0.11s, [total time] 0.00h
 [part] 102, [13-mers] 3672, [iter time] 0.04s, [avg time] 0.11s, [total t

 [part] 188, [13-mers] 6768, [iter time] 1.25s, [avg time] 0.23s, [total time] 0.01h
 [part] 189, [13-mers] 6804, [iter time] 1.99s, [avg time] 0.24s, [total time] 0.01h
 [part] 190, [13-mers] 6840, [iter time] 0.26s, [avg time] 0.24s, [total time] 0.01h
 [part] 191, [13-mers] 6876, [iter time] 2.70s, [avg time] 0.25s, [total time] 0.01h
 [part] 192, [13-mers] 6912, [iter time] 3.38s, [avg time] 0.27s, [total time] 0.01h
 [part] 193, [13-mers] 6948, [iter time] 2.41s, [avg time] 0.28s, [total time] 0.02h
 [part] 194, [13-mers] 6984, [iter time] 4.88s, [avg time] 0.30s, [total time] 0.02h
 [part] 195, [13-mers] 7020, [iter time] 9.63s, [avg time] 0.35s, [total time] 0.02h
 [part] 196, [13-mers] 7056, [iter time] 0.02s, [avg time] 0.35s, [total time] 0.02h
 [part] 197, [13-mers] 7092, [iter time] 3.33s, [avg time] 0.36s, [total time] 0.02h
 [part] 198, [13-mers] 7128, [iter time] 1.49s, [avg time] 0.37s, [total time] 0.02h
 [part] 199, [13-mers] 7164, [iter time] 9.56s, [avg time] 0.42s,

In [62]:
print('Run took {:.2f}s'.format(tf)) # Show run time

Run took 493.64s


As we can see in the output, `Maker` first warned us that it might not be able to make $1000$ parts as specified in the `target_size`, but it ventured forth, taking about nine minutes to explore the design space and constructing $200+$ RBSs, before giving up.

The $9$-bp constant motif in sequence constraint leaves only $4$ degenerate bases in every $13$-bp window containing the complete SD sequence, implying at most $4^4 = 256$ possible parts for the given sequence constraint. Such _k_-mer windows which limit the overall design space and prevents the reaching of `target_size` are called **`Lmax` limiting windows**. `Maker` was able to design $216$ of the maximum possible toolbox size before failing to find suitable _k_-mers for making newer RBSs. If we wanted, we could try increasing the `jump_count` and/or `fail_count` (see [API](https://github.com/ayaanhossain/nrpcalc/blob/master/docs/DOCS.md)) to try to reach all $256$ of the possible RBSs, although the severe structure constraint might prevent selection of some _k_-mers to realize all of these sequences.

Our goal, however, is to build $1000$ RBSs which is clearly not possible given an `Lmax` of $12$ for the specified sequence constraint. We could try introducing more degeneracy into the SD motif which might relax our constraints enough to fix the issue. But, if we don't want to alter the motif, we would have to increase our `Lmax` to expand our design space. An `Lmax` of $14$ seems reasonable, giving `Maker` $4^6 = 4096$ possible _k_-mer selection choices for all $15$-bp windows encompassing the SD motif.

Now, that we've decided to use an `Lmax=14` for our toolbox, how do we unify our present RBS toolbox with the previously designed promoter toolboxes with `Lmax=12`, in terms of non-repetitiveness? It is a feature of `NRP Calculator` that one can use a `background` initialized with a lower `Lmax` for a design job specifying a higher `Lmax`. So, if designing RBSs was our last job, it would be legal and recommended to use `bkg` as the `background`, since its `Lmax=12`, and our new RBS toolbox would be built with an `Lmax` of $14$ (higher). The opposite scenerio is not supported by `Maker` for algorithmic efficiency reasons. This implies that toolbox chaining can progressively move from a lower `Lmax` to a higher one, but not the other way round.

Alternative approaches include (1) initializing a new `background` with `Lmax=14` and inserting all previous promoters into it, followed by using the new `background` for designing RBSs, or (2) defining a new local model function that prevents every $13$-mer in the new RBSs under construction from coinciding with the previous `background` (`bkg`) containing the $13$-mers from the promoters.

The first alternative solution is pretty straight-forward, and the one we'll use for designing our toolboxes in the subsequent sections (since we'd be permanently moving onto an `Lmax=14` for present and subsequent toolboxes, and `bkg` due to its `Lmax=12` wouldn't be appropriate there), but for our current RBS toolbox design we'll look at the second alternative option to see an example of a model function that works with `background` objects.

Let's look at a possible local model function for achieving the second alternative option.

In [63]:
def prevent_promoter_conflict(seq):
    # evaluation criteria met?
    if (len(seq)) >= 13:
        # extract k-mer
        kmer = seq[-13:]
        # check for conincidence
        if kmer in bkg: # k-mer conflict found
            return (False, len(seq)-13) # retrace path
        # no conflict
        return (True, None)
    # too short a sequence
    else:
        return (True, None)

In [64]:
# toolbox1 promoter fails our evaluation as expected
assert prevent_promoter_conflict(seq=toolbox1[0]) == (False, len(toolbox1[0])-13)

In [65]:
# toolbox2 promoter also fails our evaluation as expected
assert prevent_promoter_conflict(seq=toolbox2[499]) == (False, len(toolbox2[0])-13)

In [66]:
# a poly-G 13-mer was absent in the promoters, so it is OK to be used for the RBSs
assert prevent_promoter_conflict(seq='G'*13) == (True, None)

Our local model function is done, and our `Lmax` is revised to $14$. However, unlike the previous RBS toolbox design attempt, instead of relying on just `mfe` for structure evaluation, we'll use `mfe` + `centroid` = `both` as our `struct_type` parameter to ensure both `mfe` and `centroid` conform to the given structure constraint. This ensures that designed parts fold into a given structure with high probability (at the cost of slightly increased computation time).

In [67]:
# Record starting time
t0 = time.time()

# Execute Maker
toolbox3 = nrpcalc.maker(
    seed=5,                                    # reproducible results
    seq_constr=tb3_seq_constraint,             # as defined above
    struct_constr=tb3_struct_constraint,       # as defined above
    Lmax=14,                                   # as revised from our previous attempt
    target_size=1000,                          # as stated in our goal
    part_type='RNA',                           # as stated in our goal
    struct_type='both',                        # as revised from our previous attempt
    local_model_fn=prevent_promoter_conflict,  # as defined above
    global_model_fn=None,                      # none required
    background=None)                           # background conflict resolved via local model

# Compute execution time
tf = time.time() - t0


[Non-Repetitive Parts Calculator - Maker Mode]

[Checking Constraints]
  Sequence Constraint: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNUAAGGAGGANNNNNNAUG
 Structure Constraint: .(((((....)))))...............xxxxxxxxxxxxxxxxxx
    Target Size      : 1000 parts
           Lmax      : 14 bp
  Internal Repeats   : False

 Check Status: PASS

[Checking Arguments]
   Part Type : RNA
 Struct Type : both
  Synth Opt  : False
   Jump Count: 10
   Fail Count: 1000
 Output File : None

 Check Status: PASS

Constructing Toolbox:

 [part] 1, [15-mers] 34, [iter time] 0.03s, [avg time] 0.03s, [total time] 0.00h
 [part] 2, [15-mers] 68, [iter time] 0.08s, [avg time] 0.06s, [total time] 0.00h
 [part] 3, [15-mers] 102, [iter time] 0.21s, [avg time] 0.11s, [total time] 0.00h
 [part] 4, [15-mers] 136, [iter time] 0.06s, [avg time] 0.09s, [total time] 0.00h
 [part] 5, [15-mers] 170, [iter time] 0.01s, [avg time] 0.08s, [total time] 0.00h
 [part] 6, [15-mers] 204, [iter time] 0.06s, [avg time] 0.07s, [total time] 0.

 [part] 95, [15-mers] 3230, [iter time] 0.25s, [avg time] 0.10s, [total time] 0.00h
 [part] 96, [15-mers] 3264, [iter time] 0.02s, [avg time] 0.10s, [total time] 0.00h
 [part] 97, [15-mers] 3298, [iter time] 0.02s, [avg time] 0.10s, [total time] 0.00h
 [part] 98, [15-mers] 3332, [iter time] 0.10s, [avg time] 0.10s, [total time] 0.00h
 [part] 99, [15-mers] 3366, [iter time] 0.20s, [avg time] 0.10s, [total time] 0.00h
 [part] 100, [15-mers] 3400, [iter time] 0.11s, [avg time] 0.10s, [total time] 0.00h
 [part] 101, [15-mers] 3434, [iter time] 0.10s, [avg time] 0.10s, [total time] 0.00h
 [part] 102, [15-mers] 3468, [iter time] 0.05s, [avg time] 0.10s, [total time] 0.00h
 [part] 103, [15-mers] 3502, [iter time] 0.22s, [avg time] 0.10s, [total time] 0.00h
 [part] 104, [15-mers] 3536, [iter time] 0.01s, [avg time] 0.10s, [total time] 0.00h
 [part] 105, [15-mers] 3570, [iter time] 0.05s, [avg time] 0.10s, [total time] 0.00h
 [part] 106, [15-mers] 3604, [iter time] 0.25s, [avg time] 0.10s, [tot

 [part] 194, [15-mers] 6596, [iter time] 0.07s, [avg time] 0.10s, [total time] 0.01h
 [part] 195, [15-mers] 6630, [iter time] 0.09s, [avg time] 0.10s, [total time] 0.01h
 [part] 196, [15-mers] 6664, [iter time] 0.24s, [avg time] 0.10s, [total time] 0.01h
 [part] 197, [15-mers] 6698, [iter time] 0.02s, [avg time] 0.10s, [total time] 0.01h
 [part] 198, [15-mers] 6732, [iter time] 0.05s, [avg time] 0.10s, [total time] 0.01h
 [part] 199, [15-mers] 6766, [iter time] 0.38s, [avg time] 0.10s, [total time] 0.01h
 [part] 200, [15-mers] 6800, [iter time] 0.03s, [avg time] 0.10s, [total time] 0.01h
 [part] 201, [15-mers] 6834, [iter time] 0.12s, [avg time] 0.10s, [total time] 0.01h
 [part] 202, [15-mers] 6868, [iter time] 0.14s, [avg time] 0.10s, [total time] 0.01h
 [part] 203, [15-mers] 6902, [iter time] 0.18s, [avg time] 0.10s, [total time] 0.01h
 [part] 204, [15-mers] 6936, [iter time] 0.01s, [avg time] 0.10s, [total time] 0.01h
 [part] 205, [15-mers] 6970, [iter time] 0.45s, [avg time] 0.10s,

 [part] 296, [15-mers] 10064, [iter time] 0.15s, [avg time] 0.10s, [total time] 0.01h
 [part] 297, [15-mers] 10098, [iter time] 0.22s, [avg time] 0.10s, [total time] 0.01h
 [part] 298, [15-mers] 10132, [iter time] 0.18s, [avg time] 0.11s, [total time] 0.01h
 [part] 299, [15-mers] 10166, [iter time] 0.17s, [avg time] 0.11s, [total time] 0.01h
 [part] 300, [15-mers] 10200, [iter time] 0.14s, [avg time] 0.11s, [total time] 0.01h
 [part] 301, [15-mers] 10234, [iter time] 0.17s, [avg time] 0.11s, [total time] 0.01h
 [part] 302, [15-mers] 10268, [iter time] 0.20s, [avg time] 0.11s, [total time] 0.01h
 [part] 303, [15-mers] 10302, [iter time] 0.05s, [avg time] 0.11s, [total time] 0.01h
 [part] 304, [15-mers] 10336, [iter time] 0.29s, [avg time] 0.11s, [total time] 0.01h
 [part] 305, [15-mers] 10370, [iter time] 0.07s, [avg time] 0.11s, [total time] 0.01h
 [part] 306, [15-mers] 10404, [iter time] 0.15s, [avg time] 0.11s, [total time] 0.01h
 [part] 307, [15-mers] 10438, [iter time] 0.14s, [avg 

 [part] 392, [15-mers] 13328, [iter time] 0.83s, [avg time] 0.11s, [total time] 0.01h
 [part] 393, [15-mers] 13362, [iter time] 0.06s, [avg time] 0.11s, [total time] 0.01h
 [part] 394, [15-mers] 13396, [iter time] 0.06s, [avg time] 0.11s, [total time] 0.01h
 [part] 395, [15-mers] 13430, [iter time] 0.13s, [avg time] 0.11s, [total time] 0.01h
 [part] 396, [15-mers] 13464, [iter time] 0.85s, [avg time] 0.12s, [total time] 0.01h
 [part] 397, [15-mers] 13498, [iter time] 0.01s, [avg time] 0.12s, [total time] 0.01h
 [part] 398, [15-mers] 13532, [iter time] 0.03s, [avg time] 0.12s, [total time] 0.01h
 [part] 399, [15-mers] 13566, [iter time] 0.03s, [avg time] 0.12s, [total time] 0.01h
 [part] 400, [15-mers] 13600, [iter time] 0.19s, [avg time] 0.12s, [total time] 0.01h
 [part] 401, [15-mers] 13634, [iter time] 0.23s, [avg time] 0.12s, [total time] 0.01h
 [part] 402, [15-mers] 13668, [iter time] 0.01s, [avg time] 0.12s, [total time] 0.01h
 [part] 403, [15-mers] 13702, [iter time] 0.12s, [avg 

 [part] 490, [15-mers] 16660, [iter time] 0.07s, [avg time] 0.12s, [total time] 0.02h
 [part] 491, [15-mers] 16694, [iter time] 0.26s, [avg time] 0.12s, [total time] 0.02h
 [part] 492, [15-mers] 16728, [iter time] 0.02s, [avg time] 0.12s, [total time] 0.02h
 [part] 493, [15-mers] 16762, [iter time] 0.08s, [avg time] 0.12s, [total time] 0.02h
 [part] 494, [15-mers] 16796, [iter time] 0.59s, [avg time] 0.12s, [total time] 0.02h
 [part] 495, [15-mers] 16830, [iter time] 0.01s, [avg time] 0.12s, [total time] 0.02h
 [part] 496, [15-mers] 16864, [iter time] 0.33s, [avg time] 0.12s, [total time] 0.02h
 [part] 497, [15-mers] 16898, [iter time] 0.32s, [avg time] 0.12s, [total time] 0.02h
 [part] 498, [15-mers] 16932, [iter time] 0.11s, [avg time] 0.12s, [total time] 0.02h
 [part] 499, [15-mers] 16966, [iter time] 0.29s, [avg time] 0.13s, [total time] 0.02h
 [part] 500, [15-mers] 17000, [iter time] 0.07s, [avg time] 0.13s, [total time] 0.02h
 [part] 501, [15-mers] 17034, [iter time] 1.11s, [avg 

 [part] 587, [15-mers] 19958, [iter time] 0.92s, [avg time] 0.13s, [total time] 0.02h
 [part] 588, [15-mers] 19992, [iter time] 0.34s, [avg time] 0.13s, [total time] 0.02h
 [part] 589, [15-mers] 20026, [iter time] 0.05s, [avg time] 0.13s, [total time] 0.02h
 [part] 590, [15-mers] 20060, [iter time] 0.02s, [avg time] 0.13s, [total time] 0.02h
 [part] 591, [15-mers] 20094, [iter time] 0.11s, [avg time] 0.13s, [total time] 0.02h
 [part] 592, [15-mers] 20128, [iter time] 0.59s, [avg time] 0.14s, [total time] 0.02h
 [part] 593, [15-mers] 20162, [iter time] 0.02s, [avg time] 0.13s, [total time] 0.02h
 [part] 594, [15-mers] 20196, [iter time] 0.31s, [avg time] 0.14s, [total time] 0.02h
 [part] 595, [15-mers] 20230, [iter time] 0.02s, [avg time] 0.14s, [total time] 0.02h
 [part] 596, [15-mers] 20264, [iter time] 0.06s, [avg time] 0.13s, [total time] 0.02h
 [part] 597, [15-mers] 20298, [iter time] 0.05s, [avg time] 0.13s, [total time] 0.02h
 [part] 598, [15-mers] 20332, [iter time] 0.60s, [avg 

 [part] 683, [15-mers] 23222, [iter time] 0.22s, [avg time] 0.14s, [total time] 0.03h
 [part] 684, [15-mers] 23256, [iter time] 0.56s, [avg time] 0.14s, [total time] 0.03h
 [part] 685, [15-mers] 23290, [iter time] 0.55s, [avg time] 0.14s, [total time] 0.03h
 [part] 686, [15-mers] 23324, [iter time] 0.51s, [avg time] 0.14s, [total time] 0.03h
 [part] 687, [15-mers] 23358, [iter time] 0.20s, [avg time] 0.14s, [total time] 0.03h
 [part] 688, [15-mers] 23392, [iter time] 0.59s, [avg time] 0.14s, [total time] 0.03h
 [part] 689, [15-mers] 23426, [iter time] 0.06s, [avg time] 0.14s, [total time] 0.03h
 [part] 690, [15-mers] 23460, [iter time] 0.31s, [avg time] 0.14s, [total time] 0.03h
 [part] 691, [15-mers] 23494, [iter time] 0.01s, [avg time] 0.14s, [total time] 0.03h
 [part] 692, [15-mers] 23528, [iter time] 0.18s, [avg time] 0.14s, [total time] 0.03h
 [part] 693, [15-mers] 23562, [iter time] 0.13s, [avg time] 0.14s, [total time] 0.03h
 [part] 694, [15-mers] 23596, [iter time] 0.04s, [avg 

 [part] 780, [15-mers] 26520, [iter time] 0.11s, [avg time] 0.15s, [total time] 0.03h
 [part] 781, [15-mers] 26554, [iter time] 0.13s, [avg time] 0.15s, [total time] 0.03h
 [part] 782, [15-mers] 26588, [iter time] 0.21s, [avg time] 0.15s, [total time] 0.03h
 [part] 783, [15-mers] 26622, [iter time] 0.50s, [avg time] 0.15s, [total time] 0.03h
 [part] 784, [15-mers] 26656, [iter time] 0.04s, [avg time] 0.15s, [total time] 0.03h
 [part] 785, [15-mers] 26690, [iter time] 0.20s, [avg time] 0.15s, [total time] 0.03h
 [part] 786, [15-mers] 26724, [iter time] 0.24s, [avg time] 0.15s, [total time] 0.03h
 [part] 787, [15-mers] 26758, [iter time] 0.06s, [avg time] 0.15s, [total time] 0.03h
 [part] 788, [15-mers] 26792, [iter time] 0.44s, [avg time] 0.15s, [total time] 0.03h
 [part] 789, [15-mers] 26826, [iter time] 0.07s, [avg time] 0.15s, [total time] 0.03h
 [part] 790, [15-mers] 26860, [iter time] 0.32s, [avg time] 0.15s, [total time] 0.03h
 [part] 791, [15-mers] 26894, [iter time] 0.26s, [avg 

 [part] 876, [15-mers] 29784, [iter time] 0.22s, [avg time] 0.16s, [total time] 0.04h
 [part] 877, [15-mers] 29818, [iter time] 0.08s, [avg time] 0.16s, [total time] 0.04h
 [part] 878, [15-mers] 29852, [iter time] 0.17s, [avg time] 0.16s, [total time] 0.04h
 [part] 879, [15-mers] 29886, [iter time] 0.36s, [avg time] 0.16s, [total time] 0.04h
 [part] 880, [15-mers] 29920, [iter time] 0.06s, [avg time] 0.16s, [total time] 0.04h
 [part] 881, [15-mers] 29954, [iter time] 0.01s, [avg time] 0.16s, [total time] 0.04h
 [part] 882, [15-mers] 29988, [iter time] 0.31s, [avg time] 0.16s, [total time] 0.04h
 [part] 883, [15-mers] 30022, [iter time] 0.57s, [avg time] 0.16s, [total time] 0.04h
 [part] 884, [15-mers] 30056, [iter time] 0.11s, [avg time] 0.16s, [total time] 0.04h
 [part] 885, [15-mers] 30090, [iter time] 0.04s, [avg time] 0.16s, [total time] 0.04h
 [part] 886, [15-mers] 30124, [iter time] 0.42s, [avg time] 0.16s, [total time] 0.04h
 [part] 887, [15-mers] 30158, [iter time] 0.37s, [avg 

 [part] 972, [15-mers] 33048, [iter time] 0.41s, [avg time] 0.16s, [total time] 0.04h
 [part] 973, [15-mers] 33082, [iter time] 0.13s, [avg time] 0.16s, [total time] 0.04h
 [part] 974, [15-mers] 33116, [iter time] 0.11s, [avg time] 0.16s, [total time] 0.04h
 [part] 975, [15-mers] 33150, [iter time] 0.08s, [avg time] 0.16s, [total time] 0.04h
 [part] 976, [15-mers] 33184, [iter time] 0.30s, [avg time] 0.16s, [total time] 0.04h
 [part] 977, [15-mers] 33218, [iter time] 0.04s, [avg time] 0.16s, [total time] 0.04h
 [part] 978, [15-mers] 33252, [iter time] 0.12s, [avg time] 0.16s, [total time] 0.04h
 [part] 979, [15-mers] 33286, [iter time] 0.18s, [avg time] 0.16s, [total time] 0.04h
 [part] 980, [15-mers] 33320, [iter time] 0.04s, [avg time] 0.16s, [total time] 0.04h
 [part] 981, [15-mers] 33354, [iter time] 0.45s, [avg time] 0.16s, [total time] 0.04h
 [part] 982, [15-mers] 33388, [iter time] 0.07s, [avg time] 0.16s, [total time] 0.04h
 [part] 983, [15-mers] 33422, [iter time] 0.64s, [avg 

In [68]:
print('Run took {:.2f}s'.format(tf)) # Show run time

Run took 162.70s


Compared to the previous attempt, the present attempt finishes in less than three minutes and we designed exactly $1000$ non-repetitive RBSs per our goal.

Let's review the parts, and verify non-repetitiveness of the new toolbox against `bkg` by specifying it as a background to `Finder`. If `Finder` returns all $1000$ parts as non-repetitive, then our model function worked as intended despite us not using `bkg` as the `background` directly in our call to `Maker`.

In [69]:
toolbox3[0] # first RBS designed

'GUAUCGGCUACGAUACCGCAAAAAUCGUACUAAGGAGGACAUAUUAUG'

In [70]:
toolbox3[999] # last RBS designed

'CUACUCAAGUGAGUAAUUGGGACAAUUUGUUAAGGAGGAUGUGGAAUG'

In [71]:
assert len(nrpcalc.finder(
    seq_list=toolbox3.values(),
    Lmax=14,
    background=bkg)) == 1000 # our goal of 1000 brand new non-repetitive RBSs is achieved!


[Non-Repetitive Parts Calculator - Finder Mode]

[Checking Constraints]
 Sequence List   : 1000 parts
          Lmax   : 14 bp
 Internal Repeats: False

 Check Status: PASS

[Checking Background]
 Background: kmerSetDB stored at ./tmp_bkg/ with 43000 13-mers

 Check Status: PASS

[Checking Arguments]
   Vertex Cover: nrp2
   Output  File: None

 Check Status: PASS

Extracted 1000 unique sequences out of 1000 sequences in 0.002484 seconds

Written 1000 unique sequences out to ./447c2a71-1f2e-47dd-8512-14419fceb79e/seq_list.txt in 0.002212 seconds

 [Sequence processing remaining] = 1    
 [Cliques inserted] = 1000

Built homology graph in 1.404 seconds. [Edges = 0] [Nodes = 1000]
 [Intital Nodes = 1000] - [Repetitive Nodes = 0] = [Final Nodes = 1000]

 [+] Initial independent set = 0, computing vertex cover on remaining 0 nodes.
 [+] Vertex Cover Function: NRP 2-approximation
 [+] Dumping graph into: ./447c2a71-1f2e-47dd-8512-14419fceb79e/repeat_graph.txt in 0.00145721435546875 seconds

  [x] Isolated node 511 eliminated
  [x] Isolated node 549 eliminated
  [x] Isolated node 176 eliminated
  [x] Isolated node 113 eliminated
  [x] Isolated node 818 eliminated
  [x] Isolated node 461 eliminated
  [x] Isolated node 755 eliminated
  [x] Isolated node 142 eliminated
  [x] Isolated node 79 eliminated
  [x] Isolated node 768 eliminated
  [x] Isolated node 411 eliminated
  [x] Isolated node 705 eliminated
  [x] Isolated node 348 eliminated
  [x] Isolated node 29 eliminated
  [x] Isolated node 990 eliminated
  [x] Isolated node 671 eliminated
  [x] Isolated node 298 eliminated
  [x] Isolated node 592 eliminated
  [x] Isolated node 235 eliminated
  [x] Isolated node 940 eliminated
  [x] Isolated node 877 eliminated
  [x] Isolated node 504 eliminated
  [x] Isolated node 558 eliminated
  [x] Isolated node 185 eliminated
  [x] Isolated node 122 eliminated
  [x] Isolated node 827 eliminated
  [x] Isolated node 470 eliminated
  [x] Isolated node 764 eliminated
  [x] Isolated node 15

  [x] Isolated node 134 eliminated
  [x] Isolated node 71 eliminated
  [x] Isolated node 403 eliminated
  [x] Isolated node 697 eliminated
  [x] Isolated node 340 eliminated
  [x] Isolated node 634 eliminated
  [x] Isolated node 21 eliminated
  [x] Isolated node 982 eliminated
  [x] Isolated node 663 eliminated
  [x] Isolated node 290 eliminated
  [x] Isolated node 584 eliminated
  [x] Isolated node 227 eliminated
  [x] Isolated node 932 eliminated
  [x] Isolated node 869 eliminated
  [x] Isolated node 496 eliminated
  [x] Isolated node 550 eliminated
  [x] Isolated node 177 eliminated
  [x] Isolated node 114 eliminated
  [x] Isolated node 819 eliminated
  [x] Isolated node 462 eliminated
  [x] Isolated node 756 eliminated
  [x] Isolated node 143 eliminated
  [x] Isolated node 64 eliminated
  [x] Isolated node 769 eliminated
  [x] Isolated node 412 eliminated
  [x] Isolated node 706 eliminated
  [x] Isolated node 349 eliminated
  [x] Isolated node 30 eliminated
  [x] Isolated node 991 

 [+] Current independent set size:  1000
 [+] Potential nodes for expansion: 0 (projected independent set size: 1000)
 [X] Cannot expand independent set, terminating.

Non-Repetitive Toolbox Size: 1000


### Non-Repetitive Toehold Switches with `Lmax=14`

We will now design $1000$ non-repetitive toehold RNA switches for programmable protein expression. Our constraints for designing these toehold switches are based on the work of [Green et. al. (2014)](https://www.sciencedirect.com/science/article/pii/S0092867414012896).

Before we embark on the design, let's delete the previous `background` (`bkg`) and initialize a new `background` with `Lmax=14`, into which we'll insert all of the previous toolboxes, so that we can use it for designing our toehold switches non-repetitive to all previously designed parts.

In [72]:
bkg.drop() # deletes the background from disk

True

In [74]:
chained_bkg = nrpcalc.background(
    path='./chained_bkg', # a new background path
    Lmax=14)              # updated Lmax

In [75]:
chained_bkg # check new background path and content

kmerSetDB stored at ./chained_bkg/ with 0 15-mers

In [76]:
chained_bkg.multiadd(toolbox1.values()) # add the first promoter toolbox
chained_bkg.multiadd(toolbox2.values()) # add the second promoter toolbox
chained_bkg.multiadd(toolbox3.values()) # add the third toolbox containing RBSs


[Background Processing]


  Adding Seq 267: GCCGGGAATA...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 499: CCCGGACATC...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 268: CCAACATCTG...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 499: AAGACTGCAC...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 268: ACGUAAUACU...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 533: GGUACUCAGC...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 798: ACUGAGUACA...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 999: CUACUCAAGU...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

In [77]:
chained_bkg # review background post insertion

kmerSetDB stored at ./chained_bkg/ with 75000 15-mers

Our constraint for toehold switches primarily includes a hairpin loop, and contains a $30$-bp **trigger RNA** sequence usptream of an embedded $7$-bp consensus Shine-Delgarno motif ('AGGAGGA'), separated from the start codon ('AUG') by the remaining $6$-bp stem of domain _B_, and ends with a $21$-bp linker sequence. Notably, everything upstream of the linker portion of the design has a very specific structure requirement.

Let's define the sequence and structure constraints and review them.

In [78]:
tb4_seq_constraint = 'N'*12 + 'N'*9 + 'N'*3 + 'N'*6 + 'AGGAGGA' + 'N'*6 + 'AUG' + 'N'*9 + 'N'*21
#                     -----    --------------------    -------     --------------------    -----
#                  Domain A     Domain B with Bulge    SD Motif    Domain B with START     LINKER

In [79]:
tb4_struct_constraint = 'xxxxxxxxxxxx(((((((((xxx((((((xxxxxxx))))))xxx))))))))).....................'

In [80]:
print('{:^30}'.format('Trigger RNA Sequence'))
print('-'*30)
print(tb4_seq_constraint)
print(tb4_struct_constraint)

     Trigger RNA Sequence     
------------------------------
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGGAGGANNNNNNAUGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
xxxxxxxxxxxx(((((((((xxx((((((xxxxxxx))))))xxx))))))))).....................


Although, at first glance, the constraints look fine and degenerate enough, there is a potential pitfall waiting for us when we feed these constraints to `Maker`.

Notice, that the $7$-bp SD motif ('AGGAGGA') is flanked by domain _B_ bases on either side, which engenders a $15$-bp _k_-mer window with 'AGGAGGA' in the middle and four paired bases on either side. This is illustrated below.

In this $15$-mer window, the last four bases are always going to be complementary to the first four bases. So, as soon as the first four bases ('N's) are filled in by `Maker`, the fate of the last four bases are automatically determined (they will be complementary to the first four bases). The $7$-bp SD motif is a constant in our sequence constraint which leaves only the first four bases to be selected variably by `Maker` (the last four bases become a dynamically inserted constant for each imaginable run). Thus, instead of working with a degenerate $15$-bp window with $7$ bases fixed, we're actually working with a $15$-bp window with $7+4=11$ bases fixed. This leaves us with $4^4 = 256$ possible nucleotide combinations to fill up this window, implying, a theoretical maximum toolbox size of only $256$ toehold switches. Our goal for $1000$ non-repetitive toehold switches, even with `Lmax=14`, will not be fulfilled given how we have framed the sequence and structure constraint.

Rather than abandoning hope, we may go back to the original paper for insights. It is clear that an RBS must be located between the two halves of domain _B_, but should this RBS just consist of a consensus `7`-bp SD motif only? The motif can potentially be "padded" on either side, and still leave us with effective RBSs. Accordingly, we will modify our sequence constraint to pad the SD motif with three 'N's on the $5'$-end, while ensuring that those bases remain unpaired via our modified structure constraint. This expands our design space by sixty four fold, making our goal of $1000$ non-repetitive toehold switches a real possibility. Let's re-define the constraints, and review them one last time.

In [81]:
tb4_seq_constraint = 'N'*12 + 'N'*9 + 'N'*3 + 'N'*6 + 'NNNAGGAGGA' + 'N'*6 + 'AUG' + 'N'*9 + 'N'*21
#                     -----    --------------------    ----------     --------------------    -----
#                  Domain A     Domain B with Bulge    Padded SD      Domain B with START     LINKER

In [82]:
tb4_struct_constraint = 'xxxxxxxxxxxx(((((((((xxx((((((xxxxxxxxxx))))))xxx))))))))).....................'

In [83]:
print('-'*30 + ' <-- Trigger RNA Sequence')
print(tb4_seq_constraint)
print(tb4_struct_constraint)

------------------------------ <-- Trigger RNA Sequence
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGGAGGANNNNNNAUGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
xxxxxxxxxxxx(((((((((xxx((((((xxxxxxxxxx))))))xxx))))))))).....................


Because we're dealing with toehold switches, and the design includes 'N's after the start codon, it is imperative to prevent any in-frame stop codons immediately after the start codon in the linker region. Similarly, it is important to prevent any start codon after the SD motif, in the $9$-bp spacer before the designated start codon. Time for a quick local model function.

> **Note** `Maker` builds and returns either **DNA** or **RNA** depending on the input `part_type` specficiation. The `part_type` is used to ensure correct base pairing, and select the correct energy parameters for evaluating the structure constraint for the intended scenerio. For example, toehold RNA switches are designed using correct parameters so that when they finally fold in their RNA state, they have the correct conformation. This also means that all local and global model functions used for optimization should evaluate **DNA** or **RNA** strings for evaluation depending on the `part_type` for sake of correctness.

In [84]:
start_codon = 'AUG' # rather than 'ATG'

In [85]:
stop_codons = set(['UAG', 'UAA', 'UGA']) # all stop codons are defined

In [86]:
def prevent_codon(seq):
    # we don't evaluate if were're at or before SD motif, or at
    # the designated start codon location and the the two bases
    # right after the start codon (which do not form an in-frame codon)
    
    # case 1: at or before SD motif
    if len(seq) <= 40: # pass evaluation
        return (True, None)
    
    # case 2: at the designated start codon or the two
    # bases right next to it
    if 47 <= len(seq) <= 49+2: # pass evaluation
        return (True, None)
    
    # actual evaluation time!
    
    # case 1: we have entered in the spacer after start codon
    if 41 <= len(seq) <= 46:
        # extract codon candidate
        cdn = seq[-3:]
        # is this a start codon?
        if cdn == start_codon:
            return (False, len(seq)-3) # go back three places
        # not a start codon
        return (True, None)
    
    # case 2: we have entered the linker region beyond the start codon
    if len(seq) >= 52: # first in-frame codon after the start codon
        # extract codon candidate
        cdn = seq[-3:]
        # is the codon candidate a stop codon?
        if cdn in stop_codons:
            return (False, len(seq)-3) # go back three places
        # candidate is not a stop codon
        return (True, None) # pass

In [87]:
assert prevent_codon(seq='A'*25) == (True, None) # short sequences pass

In [89]:
assert prevent_codon(seq='A'*40 + 'AUG') == (False, 40) # start codon in spacer after SD prevented

In [90]:
assert prevent_codon(seq='A'*46 + 'AUG') == (True, None) # start codon at designated location not evaluated

In [91]:
assert prevent_codon(seq='A'*50 + 'UAA') == (False, 50) # stop codons after start codon prevented

In [92]:
assert prevent_codon(seq='A'*50 + 'GCA') == (True, None) # other codons are fine

Let's now build our brand new toolbox of non-repetitive toehold switches and review them!

In [93]:
# Record starting time
t0 = time.time()

# Execute Maker
toolbox4 = nrpcalc.maker(
    seed=6,                                    # reproducible results
    seq_constr=tb4_seq_constraint,             # as defined above
    struct_constr=tb4_struct_constraint,       # as defined above
    Lmax=14,                                   # as defined above
    target_size=1000,                          # as stated in our goal
    part_type='RNA',                           # as stated in our goal
    struct_type='both',                        # as stated in our goal
    local_model_fn=prevent_codon,              # as defined above
    global_model_fn=None,                      # no requirement of a global check
    background=chained_bkg)                    # as defined above

# Compute execution time
tf = time.time() - t0


[Non-Repetitive Parts Calculator - Maker Mode]

[Checking Constraints]
  Sequence Constraint: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGGAGGANNNNNNAUGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
 Structure Constraint: xxxxxxxxxxxx(((((((((xxx((((((xxxxxxxxxx))))))xxx))))))))).....................
    Target Size      : 1000 parts
           Lmax      : 14 bp
  Internal Repeats   : False

 Check Status: PASS

[Checking Background]
 Background: kmerSetDB stored at ./chained_bkg/ with 75000 15-mers

 Check Status: PASS

[Checking Arguments]
   Part Type : RNA
 Struct Type : both
  Synth Opt  : False
   Jump Count: 10
   Fail Count: 1000
 Output File : None

 Check Status: PASS

Constructing Toolbox:

 [part] 1, [15-mers] 65, [iter time] 0.15s, [avg time] 0.15s, [total time] 0.00h
 [part] 2, [15-mers] 130, [iter time] 0.03s, [avg time] 0.09s, [total time] 0.00h
 [part] 3, [15-mers] 195, [iter time] 0.27s, [avg time] 0.15s, [total time] 0.00h
 [part] 4, [15-mers] 260, [iter time] 0.04s, [avg time] 0.12s, [tota

 [part] 92, [15-mers] 5980, [iter time] 0.34s, [avg time] 0.43s, [total time] 0.01h
 [part] 93, [15-mers] 6045, [iter time] 0.26s, [avg time] 0.43s, [total time] 0.01h
 [part] 94, [15-mers] 6110, [iter time] 0.23s, [avg time] 0.42s, [total time] 0.01h
 [part] 95, [15-mers] 6175, [iter time] 0.12s, [avg time] 0.42s, [total time] 0.01h
 [part] 96, [15-mers] 6240, [iter time] 0.06s, [avg time] 0.42s, [total time] 0.01h
 [part] 97, [15-mers] 6305, [iter time] 0.07s, [avg time] 0.41s, [total time] 0.01h
 [part] 98, [15-mers] 6370, [iter time] 0.19s, [avg time] 0.41s, [total time] 0.01h
 [part] 99, [15-mers] 6435, [iter time] 0.22s, [avg time] 0.41s, [total time] 0.01h
 [part] 100, [15-mers] 6500, [iter time] 0.05s, [avg time] 0.41s, [total time] 0.01h
 [part] 101, [15-mers] 6565, [iter time] 0.09s, [avg time] 0.40s, [total time] 0.01h
 [part] 102, [15-mers] 6630, [iter time] 0.34s, [avg time] 0.40s, [total time] 0.01h
 [part] 103, [15-mers] 6695, [iter time] 0.05s, [avg time] 0.40s, [total 

 [part] 189, [15-mers] 12285, [iter time] 0.21s, [avg time] 0.43s, [total time] 0.02h
 [part] 190, [15-mers] 12350, [iter time] 0.18s, [avg time] 0.43s, [total time] 0.02h
 [part] 191, [15-mers] 12415, [iter time] 0.02s, [avg time] 0.42s, [total time] 0.02h
 [part] 192, [15-mers] 12480, [iter time] 0.07s, [avg time] 0.42s, [total time] 0.02h
 [part] 193, [15-mers] 12545, [iter time] 0.03s, [avg time] 0.42s, [total time] 0.02h
 [part] 194, [15-mers] 12610, [iter time] 0.05s, [avg time] 0.42s, [total time] 0.02h
 [part] 195, [15-mers] 12675, [iter time] 0.02s, [avg time] 0.42s, [total time] 0.02h
 [part] 196, [15-mers] 12740, [iter time] 0.10s, [avg time] 0.42s, [total time] 0.02h
 [part] 197, [15-mers] 12805, [iter time] 0.13s, [avg time] 0.41s, [total time] 0.02h
 [part] 198, [15-mers] 12870, [iter time] 0.32s, [avg time] 0.41s, [total time] 0.02h
 [part] 199, [15-mers] 12935, [iter time] 0.13s, [avg time] 0.41s, [total time] 0.02h
 [part] 200, [15-mers] 13000, [iter time] 0.03s, [avg 

 [part] 287, [15-mers] 18655, [iter time] 0.11s, [avg time] 0.35s, [total time] 0.03h
 [part] 288, [15-mers] 18720, [iter time] 0.17s, [avg time] 0.35s, [total time] 0.03h
 [part] 289, [15-mers] 18785, [iter time] 0.16s, [avg time] 0.35s, [total time] 0.03h
 [part] 290, [15-mers] 18850, [iter time] 0.60s, [avg time] 0.35s, [total time] 0.03h
 [part] 291, [15-mers] 18915, [iter time] 0.06s, [avg time] 0.35s, [total time] 0.03h
 [part] 292, [15-mers] 18980, [iter time] 0.16s, [avg time] 0.35s, [total time] 0.03h
 [part] 293, [15-mers] 19045, [iter time] 0.37s, [avg time] 0.35s, [total time] 0.03h
 [part] 294, [15-mers] 19110, [iter time] 0.03s, [avg time] 0.35s, [total time] 0.03h
 [part] 295, [15-mers] 19175, [iter time] 0.06s, [avg time] 0.34s, [total time] 0.03h
 [part] 296, [15-mers] 19240, [iter time] 0.04s, [avg time] 0.34s, [total time] 0.03h
 [part] 297, [15-mers] 19305, [iter time] 25.58s, [avg time] 0.43s, [total time] 0.04h
 [part] 298, [15-mers] 19370, [iter time] 0.22s, [avg

 [part] 383, [15-mers] 24895, [iter time] 0.08s, [avg time] 0.65s, [total time] 0.07h
 [part] 384, [15-mers] 24960, [iter time] 0.14s, [avg time] 0.65s, [total time] 0.07h
 [part] 385, [15-mers] 25025, [iter time] 0.06s, [avg time] 0.65s, [total time] 0.07h
 [part] 386, [15-mers] 25090, [iter time] 0.26s, [avg time] 0.65s, [total time] 0.07h
 [part] 387, [15-mers] 25155, [iter time] 0.36s, [avg time] 0.64s, [total time] 0.07h
 [part] 388, [15-mers] 25220, [iter time] 0.17s, [avg time] 0.64s, [total time] 0.07h
 [part] 389, [15-mers] 25285, [iter time] 0.06s, [avg time] 0.64s, [total time] 0.07h
 [part] 390, [15-mers] 25350, [iter time] 0.04s, [avg time] 0.64s, [total time] 0.07h
 [part] 391, [15-mers] 25415, [iter time] 0.06s, [avg time] 0.64s, [total time] 0.07h
 [part] 392, [15-mers] 25480, [iter time] 0.93s, [avg time] 0.64s, [total time] 0.07h
 [part] 393, [15-mers] 25545, [iter time] 0.21s, [avg time] 0.64s, [total time] 0.07h
 [part] 394, [15-mers] 25610, [iter time] 0.10s, [avg 

 [part] 480, [15-mers] 31200, [iter time] 0.15s, [avg time] 0.56s, [total time] 0.07h
 [part] 481, [15-mers] 31265, [iter time] 0.24s, [avg time] 0.56s, [total time] 0.07h
 [part] 482, [15-mers] 31330, [iter time] 0.20s, [avg time] 0.56s, [total time] 0.07h
 [part] 483, [15-mers] 31395, [iter time] 29.11s, [avg time] 0.62s, [total time] 0.08h
 [part] 484, [15-mers] 31460, [iter time] 0.26s, [avg time] 0.62s, [total time] 0.08h
 [part] 485, [15-mers] 31525, [iter time] 0.16s, [avg time] 0.62s, [total time] 0.08h
 [part] 486, [15-mers] 31590, [iter time] 0.12s, [avg time] 0.62s, [total time] 0.08h
 [part] 487, [15-mers] 31655, [iter time] 0.05s, [avg time] 0.62s, [total time] 0.08h
 [part] 488, [15-mers] 31720, [iter time] 0.10s, [avg time] 0.61s, [total time] 0.08h
 [part] 489, [15-mers] 31785, [iter time] 0.32s, [avg time] 0.61s, [total time] 0.08h
 [part] 490, [15-mers] 31850, [iter time] 0.06s, [avg time] 0.61s, [total time] 0.08h
 [part] 491, [15-mers] 31915, [iter time] 0.23s, [avg

 [part] 576, [15-mers] 37440, [iter time] 30.53s, [avg time] 0.61s, [total time] 0.10h
 [part] 577, [15-mers] 37505, [iter time] 0.10s, [avg time] 0.61s, [total time] 0.10h
 [part] 578, [15-mers] 37570, [iter time] 0.69s, [avg time] 0.61s, [total time] 0.10h
 [part] 579, [15-mers] 37635, [iter time] 0.04s, [avg time] 0.61s, [total time] 0.10h
 [part] 580, [15-mers] 37700, [iter time] 0.15s, [avg time] 0.61s, [total time] 0.10h
 [part] 581, [15-mers] 37765, [iter time] 0.04s, [avg time] 0.61s, [total time] 0.10h
 [part] 582, [15-mers] 37830, [iter time] 0.03s, [avg time] 0.61s, [total time] 0.10h
 [part] 583, [15-mers] 37895, [iter time] 1.03s, [avg time] 0.61s, [total time] 0.10h
 [part] 584, [15-mers] 37960, [iter time] 0.07s, [avg time] 0.61s, [total time] 0.10h
 [part] 585, [15-mers] 38025, [iter time] 0.06s, [avg time] 0.61s, [total time] 0.10h
 [part] 586, [15-mers] 38090, [iter time] 0.13s, [avg time] 0.60s, [total time] 0.10h
 [part] 587, [15-mers] 38155, [iter time] 1.03s, [avg

 [part] 673, [15-mers] 43745, [iter time] 0.14s, [avg time] 0.70s, [total time] 0.13h
 [part] 674, [15-mers] 43810, [iter time] 0.20s, [avg time] 0.70s, [total time] 0.13h
 [part] 675, [15-mers] 43875, [iter time] 0.07s, [avg time] 0.69s, [total time] 0.13h
 [part] 676, [15-mers] 43940, [iter time] 0.22s, [avg time] 0.69s, [total time] 0.13h
 [part] 677, [15-mers] 44005, [iter time] 0.08s, [avg time] 0.69s, [total time] 0.13h
 [part] 678, [15-mers] 44070, [iter time] 0.43s, [avg time] 0.69s, [total time] 0.13h
 [part] 679, [15-mers] 44135, [iter time] 0.14s, [avg time] 0.69s, [total time] 0.13h
 [part] 680, [15-mers] 44200, [iter time] 0.83s, [avg time] 0.69s, [total time] 0.13h
 [part] 681, [15-mers] 44265, [iter time] 0.17s, [avg time] 0.69s, [total time] 0.13h
 [part] 682, [15-mers] 44330, [iter time] 0.36s, [avg time] 0.69s, [total time] 0.13h
 [part] 683, [15-mers] 44395, [iter time] 0.05s, [avg time] 0.69s, [total time] 0.13h
 [part] 684, [15-mers] 44460, [iter time] 0.04s, [avg 

 [part] 769, [15-mers] 49985, [iter time] 0.43s, [avg time] 0.64s, [total time] 0.14h
 [part] 770, [15-mers] 50050, [iter time] 0.56s, [avg time] 0.64s, [total time] 0.14h
 [part] 771, [15-mers] 50115, [iter time] 0.03s, [avg time] 0.63s, [total time] 0.14h
 [part] 772, [15-mers] 50180, [iter time] 0.21s, [avg time] 0.63s, [total time] 0.14h
 [part] 773, [15-mers] 50245, [iter time] 0.13s, [avg time] 0.63s, [total time] 0.14h
 [part] 774, [15-mers] 50310, [iter time] 0.08s, [avg time] 0.63s, [total time] 0.14h
 [part] 775, [15-mers] 50375, [iter time] 0.10s, [avg time] 0.63s, [total time] 0.14h
 [part] 776, [15-mers] 50440, [iter time] 0.37s, [avg time] 0.63s, [total time] 0.14h
 [part] 777, [15-mers] 50505, [iter time] 0.02s, [avg time] 0.63s, [total time] 0.14h
 [part] 778, [15-mers] 50570, [iter time] 0.07s, [avg time] 0.63s, [total time] 0.14h
 [part] 779, [15-mers] 50635, [iter time] 0.27s, [avg time] 0.63s, [total time] 0.14h
 [part] 780, [15-mers] 50700, [iter time] 0.11s, [avg 

 [part] 865, [15-mers] 56225, [iter time] 0.40s, [avg time] 0.71s, [total time] 0.17h
 [part] 866, [15-mers] 56290, [iter time] 0.28s, [avg time] 0.71s, [total time] 0.17h
 [part] 867, [15-mers] 56355, [iter time] 34.45s, [avg time] 0.74s, [total time] 0.18h
 [part] 868, [15-mers] 56420, [iter time] 0.24s, [avg time] 0.74s, [total time] 0.18h
 [part] 869, [15-mers] 56485, [iter time] 0.39s, [avg time] 0.74s, [total time] 0.18h
 [part] 870, [15-mers] 56550, [iter time] 0.04s, [avg time] 0.74s, [total time] 0.18h
 [part] 871, [15-mers] 56615, [iter time] 0.30s, [avg time] 0.74s, [total time] 0.18h
 [part] 872, [15-mers] 56680, [iter time] 0.19s, [avg time] 0.74s, [total time] 0.18h
 [part] 873, [15-mers] 56745, [iter time] 0.32s, [avg time] 0.74s, [total time] 0.18h
 [part] 874, [15-mers] 56810, [iter time] 0.09s, [avg time] 0.74s, [total time] 0.18h
 [part] 875, [15-mers] 56875, [iter time] 0.30s, [avg time] 0.74s, [total time] 0.18h
 [part] 876, [15-mers] 56940, [iter time] 0.19s, [avg

 [part] 963, [15-mers] 62595, [iter time] 0.35s, [avg time] 0.73s, [total time] 0.20h
 [part] 964, [15-mers] 62660, [iter time] 0.08s, [avg time] 0.73s, [total time] 0.20h
 [part] 965, [15-mers] 62725, [iter time] 0.54s, [avg time] 0.73s, [total time] 0.20h
 [part] 966, [15-mers] 62790, [iter time] 0.07s, [avg time] 0.73s, [total time] 0.20h
 [part] 967, [15-mers] 62855, [iter time] 0.15s, [avg time] 0.73s, [total time] 0.20h
 [part] 968, [15-mers] 62920, [iter time] 0.31s, [avg time] 0.73s, [total time] 0.20h
 [part] 969, [15-mers] 62985, [iter time] 0.64s, [avg time] 0.73s, [total time] 0.20h
 [part] 970, [15-mers] 63050, [iter time] 0.08s, [avg time] 0.73s, [total time] 0.20h
 [part] 971, [15-mers] 63115, [iter time] 0.20s, [avg time] 0.73s, [total time] 0.20h
 [part] 972, [15-mers] 63180, [iter time] 0.02s, [avg time] 0.73s, [total time] 0.20h
 [part] 973, [15-mers] 63245, [iter time] 35.93s, [avg time] 0.76s, [total time] 0.21h
 [part] 974, [15-mers] 63310, [iter time] 0.15s, [avg

In [94]:
print('Run took {:.2f}s'.format(tf)) # Show run time

Run took 785.20s


In [95]:
toolbox4[0] # first switch in the toolbox

'CGUCUCCCCUUCCGUCGGCAGAUAUAUUGUAGAAGGAGGAACAAUAAUGCUGCCGACGCACUGCCGUAUAGUUUAGAAC'

In [96]:
toolbox4[999] # last switch in the toolbox

'UCCCCGAAUAUGCCCUAAAUAGUGGUGCCGGGAAGGAGGACGGCACAUGUAUUUAGGGGAAAUCAGUUUUAUUGGCAAU'

In [97]:
assert len(nrpcalc.finder(
    seq_list=toolbox4.values(),
    Lmax=14,
    background=chained_bkg)) == 1000 # job done!


[Non-Repetitive Parts Calculator - Finder Mode]

[Checking Constraints]
 Sequence List   : 1000 parts
          Lmax   : 14 bp
 Internal Repeats: False

 Check Status: PASS

[Checking Background]
 Background: kmerSetDB stored at ./chained_bkg/ with 75000 15-mers

 Check Status: PASS

[Checking Arguments]
   Vertex Cover: nrp2
   Output  File: None

 Check Status: PASS

Extracted 1000 unique sequences out of 1000 sequences in 0.001104 seconds

Written 1000 unique sequences out to ./97b0be17-01ac-42f5-ad56-3ef1eb61cae5/seq_list.txt in 0.00128 seconds

 [Sequence processing remaining] = 1    
 [Cliques inserted] = 1000

Built homology graph in 1.541 seconds. [Edges = 0] [Nodes = 1000]
 [Intital Nodes = 1000] - [Repetitive Nodes = 0] = [Final Nodes = 1000]

 [+] Initial independent set = 0, computing vertex cover on remaining 0 nodes.
 [+] Vertex Cover Function: NRP 2-approximation
 [+] Dumping graph into: ./97b0be17-01ac-42f5-ad56-3ef1eb61cae5/repeat_graph.txt in 0.0016946792602539062 se

  [x] Isolated node 90 eliminated
  [x] Isolated node 795 eliminated
  [x] Isolated node 438 eliminated
  [x] Isolated node 732 eliminated
  [x] Isolated node 375 eliminated
  [x] Isolated node 40 eliminated
  [x] Isolated node 388 eliminated
  [x] Isolated node 682 eliminated
  [x] Isolated node 325 eliminated
  [x] Isolated node 619 eliminated
  [x] Isolated node 6 eliminated
  [x] Isolated node 967 eliminated
  [x] Isolated node 888 eliminated
  [x] Isolated node 275 eliminated
  [x] Isolated node 569 eliminated
  [x] Isolated node 212 eliminated
  [x] Isolated node 917 eliminated
  [x] Isolated node 854 eliminated
  [x] Isolated node 481 eliminated
  [x] Isolated node 535 eliminated
  [x] Isolated node 162 eliminated
  [x] Isolated node 99 eliminated
  [x] Isolated node 804 eliminated
  [x] Isolated node 447 eliminated
  [x] Isolated node 741 eliminated
  [x] Isolated node 368 eliminated
  [x] Isolated node 49 eliminated
  [x] Isolated node 397 eliminated
  [x] Isolated node 691 el

  [x] Isolated node 889 eliminated
  [x] Isolated node 276 eliminated
  [x] Isolated node 570 eliminated
  [x] Isolated node 213 eliminated
  [x] Isolated node 918 eliminated
  [x] Isolated node 855 eliminated
  [x] Isolated node 482 eliminated
  [x] Isolated node 520 eliminated
  [x] Isolated node 163 eliminated
  [x] Isolated node 100 eliminated
  [x] Isolated node 805 eliminated
  [x] Isolated node 432 eliminated
  [x] Isolated node 742 eliminated
  [x] Isolated node 369 eliminated
  [x] Isolated node 50 eliminated
  [x] Isolated node 398 eliminated
  [x] Isolated node 692 eliminated
  [x] Isolated node 335 eliminated
  [x] Isolated node 629 eliminated
  [x] Isolated node 0 eliminated
  [x] Isolated node 961 eliminated
  [x] Isolated node 642 eliminated
  [x] Isolated node 285 eliminated
  [x] Isolated node 579 eliminated
  [x] Isolated node 222 eliminated
  [x] Isolated node 927 eliminated
  [x] Isolated node 848 eliminated
  [x] Isolated node 491 eliminated
  [x] Isolated node 529

### Non-Repetitive Intrinsic Terminators with `Lmax=14`

For our final demonstration, we will design $1000$ non-repetitive rho-independent bacterial terminators based on the work of [Chen et. al. (2013)](https://www.nature.com/articles/nmeth.2515).

Our design includes a highly degenerate sequence constraint with embedded poly-A and poly-U motifs, and a $15$-bp stem in the structure constraint. Based on the paper, the $8$-bp U-rich tract is $8$ bases downstream of the stem, and pairs with the complementary A-rich tract immediately upstream of the stem. We will not be using any model functions in this example, but we will ensure that the terminators are non-repetitive to all of the toolboxes designed above.

In [98]:
#                                A Tract|Strong Bases       Strong Bases         U Tract   
#                                -------|-----                    ------        --------   
tb5_seq_constraint    = 'NNNNNNNNAAAAAAAASNSNSNNNNNNNNNNNNNNNNNNNNNNNNNNNSNSNSNNNNNNNNUUUUUUUUNNNNNNNN'
tb5_struct_constraint = '........(((((((((((((((((((((((xxxxxxx)))))))))))))))xxxxxxxx))))))))........'
#                                        ---------------
#                                        15-bp Stem

Note, that the terminator structure constraint mandates a $15$-bp stem which implies that all designed terminators must have an internal repeat of $15$ bases, yet our desired `Lmax` is $14$. In such scenerios, we can set `internal_repeats=True`, and ask `Maker` to preserve parts with internal repeats, while still eliminating shared repeats between all pairs of parts.

Let's update our `background` object with the previously designed toehold switches, and then design the terminators.

In [99]:
chained_bkg.multiadd(toolbox4.values()) # update background with toehold switches


[Background Processing]


  Adding Seq 267: UCAUCAGUGU...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 532: CGCAUGACAU...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 797: AAUGUAGAAC...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

  Adding Seq 999: UCCCCGAAUA...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

In [None]:
# Record starting time
t0 = time.time()

# Execute Maker
toolbox5 = nrpcalc.maker(
    seed=7,                                    # reproducible results
    seq_constr=tb5_seq_constraint,             # as defined above
    struct_constr=tb5_struct_constraint,       # as defined above
    Lmax=14,                                   # as defined above
    internal_repeats=True,                     # as stated in our goal
    target_size=1000,                          # as stated in our goal
    part_type='RNA',                           # as stated in our goal
    struct_type='both',                        # as stated in our goal
    local_model_fn=None,                       # no requirement of a local check
    global_model_fn=None,                      # no requirement of a global check
    background=chained_bkg)                    # as defined above

# Compute execution time
tf = time.time() - t0


[Non-Repetitive Parts Calculator - Maker Mode]

[Checking Constraints]
  Sequence Constraint: NNNNNNNNAAAAAAAASNSNSNNNNNNNNNNNNNNNNNNNNNNNNNNNSNSNSNNNNNNNNUUUUUUUUNNNNNNNN
 Structure Constraint: ........(((((((((((((((((((((((xxxxxxx)))))))))))))))xxxxxxxx))))))))........
    Target Size      : 1000 parts
           Lmax      : 14 bp
  Internal Repeats   : True

 Check Status: PASS

[Checking Background]
 Background: kmerSetDB stored at ./chained_bkg/ with 140000 15-mers

 Check Status: PASS

[Checking Arguments]
   Part Type : RNA
 Struct Type : both
  Synth Opt  : False
   Jump Count: 10
   Fail Count: 1000
 Output File : None

 Check Status: PASS

Constructing Toolbox:

 [part] 1, [15-mers] 62, [iter time] 0.35s, [avg time] 0.35s, [total time] 0.00h
 [part] 2, [15-mers] 124, [iter time] 0.58s, [avg time] 0.47s, [total time] 0.00h
 [part] 3, [15-mers] 186, [iter time] 1.33s, [avg time] 0.75s, [total time] 0.00h
 [part] 4, [15-mers] 248, [iter time] 0.33s, [avg time] 0.65s, [total ti

 [part] 92, [15-mers] 5704, [iter time] 1.47s, [avg time] 0.46s, [total time] 0.01h
 [part] 93, [15-mers] 5766, [iter time] 0.27s, [avg time] 0.45s, [total time] 0.01h
 [part] 94, [15-mers] 5828, [iter time] 0.16s, [avg time] 0.45s, [total time] 0.01h
 [part] 95, [15-mers] 5890, [iter time] 0.18s, [avg time] 0.45s, [total time] 0.01h
 [part] 96, [15-mers] 5952, [iter time] 0.27s, [avg time] 0.45s, [total time] 0.01h
 [part] 97, [15-mers] 6014, [iter time] 0.09s, [avg time] 0.44s, [total time] 0.01h
 [part] 98, [15-mers] 6076, [iter time] 1.46s, [avg time] 0.45s, [total time] 0.01h
 [part] 99, [15-mers] 6138, [iter time] 0.17s, [avg time] 0.45s, [total time] 0.01h
 [part] 100, [15-mers] 6200, [iter time] 0.74s, [avg time] 0.45s, [total time] 0.01h
 [part] 101, [15-mers] 6262, [iter time] 0.17s, [avg time] 0.45s, [total time] 0.01h
 [part] 102, [15-mers] 6324, [iter time] 0.18s, [avg time] 0.45s, [total time] 0.01h
 [part] 103, [15-mers] 6386, [iter time] 0.13s, [avg time] 0.44s, [total 

 [part] 190, [15-mers] 11780, [iter time] 0.38s, [avg time] 0.51s, [total time] 0.03h
 [part] 191, [15-mers] 11842, [iter time] 0.68s, [avg time] 0.51s, [total time] 0.03h
 [part] 192, [15-mers] 11904, [iter time] 1.32s, [avg time] 0.51s, [total time] 0.03h
 [part] 193, [15-mers] 11966, [iter time] 0.12s, [avg time] 0.51s, [total time] 0.03h
 [part] 194, [15-mers] 12028, [iter time] 0.47s, [avg time] 0.51s, [total time] 0.03h
 [part] 195, [15-mers] 12090, [iter time] 0.16s, [avg time] 0.51s, [total time] 0.03h
 [part] 196, [15-mers] 12152, [iter time] 0.02s, [avg time] 0.51s, [total time] 0.03h
 [part] 197, [15-mers] 12214, [iter time] 1.51s, [avg time] 0.51s, [total time] 0.03h
 [part] 198, [15-mers] 12276, [iter time] 0.41s, [avg time] 0.51s, [total time] 0.03h
 [part] 199, [15-mers] 12338, [iter time] 0.19s, [avg time] 0.51s, [total time] 0.03h
 [part] 200, [15-mers] 12400, [iter time] 0.08s, [avg time] 0.51s, [total time] 0.03h
 [part] 201, [15-mers] 12462, [iter time] 0.20s, [avg 

 [part] 286, [15-mers] 17732, [iter time] 0.97s, [avg time] 0.51s, [total time] 0.04h
 [part] 287, [15-mers] 17793, [iter time] 0.43s, [avg time] 0.51s, [total time] 0.04h
 [part] 288, [15-mers] 17855, [iter time] 0.08s, [avg time] 0.51s, [total time] 0.04h
 [part] 289, [15-mers] 17917, [iter time] 0.33s, [avg time] 0.51s, [total time] 0.04h
 [part] 290, [15-mers] 17979, [iter time] 0.11s, [avg time] 0.51s, [total time] 0.04h
 [part] 291, [15-mers] 18041, [iter time] 0.32s, [avg time] 0.51s, [total time] 0.04h
 [part] 292, [15-mers] 18103, [iter time] 0.50s, [avg time] 0.51s, [total time] 0.04h
 [part] 293, [15-mers] 18165, [iter time] 0.39s, [avg time] 0.51s, [total time] 0.04h
 [part] 294, [15-mers] 18227, [iter time] 0.02s, [avg time] 0.50s, [total time] 0.04h
 [part] 295, [15-mers] 18289, [iter time] 0.24s, [avg time] 0.50s, [total time] 0.04h
 [part] 296, [15-mers] 18351, [iter time] 0.45s, [avg time] 0.50s, [total time] 0.04h
 [part] 297, [15-mers] 18413, [iter time] 0.02s, [avg 

 [part] 382, [15-mers] 23683, [iter time] 0.79s, [avg time] 0.49s, [total time] 0.05h
 [part] 383, [15-mers] 23745, [iter time] 0.12s, [avg time] 0.49s, [total time] 0.05h
 [part] 384, [15-mers] 23807, [iter time] 0.23s, [avg time] 0.49s, [total time] 0.05h
 [part] 385, [15-mers] 23869, [iter time] 0.09s, [avg time] 0.49s, [total time] 0.05h
 [part] 386, [15-mers] 23931, [iter time] 0.19s, [avg time] 0.49s, [total time] 0.05h
 [part] 387, [15-mers] 23993, [iter time] 0.46s, [avg time] 0.49s, [total time] 0.05h
 [part] 388, [15-mers] 24055, [iter time] 0.09s, [avg time] 0.49s, [total time] 0.05h
 [part] 389, [15-mers] 24117, [iter time] 0.48s, [avg time] 0.49s, [total time] 0.05h
 [part] 390, [15-mers] 24179, [iter time] 0.13s, [avg time] 0.49s, [total time] 0.05h
 [part] 391, [15-mers] 24241, [iter time] 0.31s, [avg time] 0.49s, [total time] 0.05h
 [part] 392, [15-mers] 24303, [iter time] 0.07s, [avg time] 0.49s, [total time] 0.05h
 [part] 393, [15-mers] 24365, [iter time] 0.70s, [avg 

In [None]:
print('Run took {:.2f}s'.format(tf)) # Show run time

Let's review our designed terminators, and ensure that all toolboxes are non-repetitive to each other, as a final check.

In [None]:
toolbox5[0] # first terminator designed

In [None]:
toolbox5[999] # last terminator designed

In [None]:
all_toolboxes = [] # our final toolbox list
# insert all toolboxes designed so far into all_toolboxes
for toolbox in [toolbox1, toolbox2, toolbox3, toolbox4, toolbox5]:
    all_toolboxes.extend(toolbox.values())

In [None]:
assert len(nrpcalc.finder(
    seq_list=toolbox4.values(),
    Lmax=14)) == 4000 # all toolboxes we designed may be used simultaneously
                      # without introducing any repeat longer than 14-bp

Notice that we didn't specify `chained_bkg` as the `background` in the `Finder` job above, because it already contains _15_-mers from the previous four toolboxes. We will now dispense off with `chained_bkg` since it has served its purpose.

In [None]:
chained_bkg.drop() # goodbye!

### And Now, Our Watch is Ended

We hope this notebook is useful to you in learning how to use `NRP Calculator` effctively. We hope to convince you that `NRP Calculator` can be a useful tool in your arsenal in your quest for genetic systems engineering. We had a lot of fun developing this notebook, and we hope you'll sharing it with your students and colleagues who might benefit from `NRP Calculator`. Despite our intention on clarity, if any part of the notebook remains ambigous or unreachable to you, please reach the authors, who'd be more than delighted to help you parse the information.

We'd like to stress that the genetic parts discussed above are not the only ones that can be designed using `NRP Calculator`. For example, at one point, we were interested in showing the design of non-repetitive sgRNA handles from [Reis et. al. (2019)](https://www.nature.com/articles/s41587-019-0286-9) and non-repetitive ribozymes based on the work of [Nielsen et. al. (2015)](https://science.sciencemag.org/content/352/6281/aac7341), both of which are amazing reads and have been influential in the design of `NRP Calculator`. Instead, we focussed on commonly used genetic parts that is perhaps more accessible to a broader audience, in our opinion, with a focus on how to use `NRP Calculator` effectively in designing them. This tool and notebook is left to our synthetic biology colleagues everywhere to help them engineer ever-larger and stable genetic systems.

### References

* Reis, A. C., Halper, S. M., Vezeau, G. E., Cetnar, D. P., Hossain, A., Clauer, P. R., and Salis, H. M. (2019). Simultaneous repression of multiple bacterial genes using nonrepetitive extra-long sgRNA arrays. Nature Biotechnology, 37(11), 1294-1301.


* Larson, M. H., Gilbert, L. A., Wang, X., Lim, W. A., Weissman, J. S., and Qi, L. S. (2013). CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nature Protocols, 8(11), 2180-2196.


* Salis, H. M., Mirsky, E. A., and Voigt, C. A. (2009). Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnology, 27(10), 946-950.


* Green, A. A., Silver, P. A., Collins, J. J., and Yin, P. (2014). Toehold switches: de-novo-designed regulators of gene expression. Cell, 159(4), 925-939.


* Chen, Y. J., Liu, P., Nielsen, A. A., Brophy, J. A., Clancy, K., Peterson, T., and Voigt, C. A. (2013). Characterization of 582 natural and synthetic terminators and quantification of their design constraints. Nature Methods, 10(7), 659-664.


* Nielsen, A. A., Der, B. S., Shin, J., Vaidyanathan, P., Paralanov, V., Strychalski, E. A., Ross, D., Densmore, D., and Voigt, C. A. (2016). Genetic circuit design automation. Science, 352(6281).


* Hossain, A., Lopez, E., Halper, S. M., Cetnar, D. P., Reis, A. C., Strickland, D., Klavins, E. and Salis, H. M. (2020). Automated design of thousands of nonrepetitive parts for engineering stable genetic systems. Nature Biotechnology, 1-10.