# Optimizing protein stability using POLi

In this chapter, we discuss how an objective function should be optimized in `POLi` _ideally_. The current implementation might diverge for now, but this is what we aim for.

## Stability optimization is a registered problem


`POLi`'s registry has a `get_problems()` method which returns the objective functions already compiled. It works by loading up a `config.rc` file sitting inside `src/poli/core`.

In [1]:
from poli.core.registry import get_problems
get_problems()

['MY_PROBLEM', 'SMB', 'FoldX_stability']

As you can see, `FoldX_stability` is already registered as a problem factory.

:::{admonition} You don't have it?

If `FoldX_stability` is not part of your registered problems, check `examples/adding_foldx_stability_as_objective_function` in `poli_baselines`. You should have it after running the script called `foldx_stability_objective_function.py`!

:::

Let's stick with it as a problem name:

In [2]:
problem_name = "FoldX_stability"

## Optimizing `mRogue`

In this example, we will focus on optimizing [`mRogue`, also known as `3NED`](https://www.rcsb.org/structure/3NED), one of the red fluorescent proteins explored in `LaMBO` {cite:p}`stanton2022accelerating`. Before optimization, we need to download the file and "repair" it (see [single mutations using foldx](../../understanding_foldx/01-single-mutation-using-foldx/index.ipynb)).

We assume that the file is already here: [TODO: expand/automate the process of repair].

In [3]:
!ls

3ned_Repair.pdb                    optimizing_protein_stability.ipynb


In [4]:
from pathlib import Path

wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists()  # Should say True

True

## Defining the objective function

In this tutorial, we optimize the stability of `mRogue` using the `FoldX_Stability` problem factory. The first step is creating the problem:

:::{warning}

In general, it is a good idea to check how to create instances of individual problems in their documentation, since they might need extra inputs. [TODO: add where to find these].

`FoldX_stability` only needs one extra keyword argument: a `wildtype_pdb_path`. `poli` will hopefully remind you what you forgot with its error messages.

:::

In [5]:
from poli import objective_factory

problem_info, f, x0, y0, run_info = objective_factory.create(
    name="FoldX_stability",
    caller_info=None,
    observer=None,
    wildtype_pdb_path=wildtype_pdb_path
)

   ********************************************
   ***                                      ***
   ***             FoldX 4 (c)              ***
   ***                                      ***
   ***     code by the FoldX Consortium     ***
   ***                                      ***
   ***     Jesper Borg, Frederic Rousseau   ***
   ***    Joost Schymkowitz, Luis Serrano   ***
   ***    Peter Vanhee, Erik Verschueren    ***
   ***     Lies Baeten, Javier Delgado      ***
   ***       and Francois Stricher          ***
   *** and any other of the 9! permutations ***
   ***   based on an original concept by    ***
   ***   Raphael Guerois and Luis Serrano   ***
   ********************************************

1 models read: 3ned_Repair.pdb

BackHbond       =               -178.70
SideHbond       =               -76.61
Energy_VdW      =               -267.80
Electro         =               -13.75
Energy_SolvP    =               374.21
Energy_SolvH    =               -351.07
Energy_vdw

`objective_factory.create` returns four things:

1. a `problem_info` with a description of the problem, including useful attributes like `alphabet` or `max_sequence_length`. (See more [here (TODO: ADD)]()).
2. a black-box function `f: AbstractBlackBox` from `poli`.
3. an initial design `x0: np.ndarray`, and
4. an initial evaluation `y0: np.ndarray`.
5. `run_info`, or the output of the observer (?).

These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the `AbstractSolver` in `poli_baselines.core.abstract_solver`).

## Optimizing using a `RandomMutation` solver

In this tutorial we use the simplest baseline for discrete sequence optimization: a `RandomMutation` which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.

:::{note}

There's nothing special about `RandomMutation` here. You could drop-in any solver you implement as long as it

1. Inherits from `AbstractSolver` in `poli_baselines.core.abstract_solver`, and it
2. implements the abstract method `next_candidate() -> np.ndarray`.

[Check this tutorial on creating solvers for more details](../desired_design_patterns/defining_a_problem_solver.md).

:::

In [11]:
from poli_baselines.solvers.simple.random_mutation import RandomMutation
solver = RandomMutation(
    black_box=f,
    x0=x0,
    y0=y0,
    alphabet_size=len(problem_info.alphabet)
)

**And that's it!** You can optimize the objective function passed as `black_box` by just calling the `.solve(n_iters)` method: (be careful, this might take a while)

In [12]:
solver.solve(max_iter=3)

Your file run OK
End time of FoldX: Thu Aug 10 16:22:09 2023
Total time spend: 42.07 seconds.
validated file "3ned_Repair_1.pdb" => successfully finished
Cleaning BuildModel...DONE
Iteration 0: [[10.2741]], best so far: 10.2741
   ********************************************
   ***                                      ***
   ***             FoldX 4 (c)              ***
   ***                                      ***
   ***     code by the FoldX Consortium     ***
   ***                                      ***
   ***     Jesper Borg, Frederic Rousseau   ***
   ***    Joost Schymkowitz, Luis Serrano   ***
   ***    Peter Vanhee, Erik Verschueren    ***
   ***     Lies Baeten, Javier Delgado      ***
   ***       and Francois Stricher          ***
   *** and any other of the 9! permutations ***
   ***   based on an original concept by    ***
   ***   Raphael Guerois and Luis Serrano   ***
   ********************************************

1 models read: 3ned_Repair.pdb

BackHbond       =  

   ********************************************
   ***                                      ***
   ***             FoldX 4 (c)              ***
   ***                                      ***
   ***     code by the FoldX Consortium     ***
   ***                                      ***
   ***     Jesper Borg, Frederic Rousseau   ***
   ***    Joost Schymkowitz, Luis Serrano   ***
   ***    Peter Vanhee, Erik Verschueren    ***
   ***     Lies Baeten, Javier Delgado      ***
   ***       and Francois Stricher          ***
   *** and any other of the 9! permutations ***
   ***   based on an original concept by    ***
   ***   Raphael Guerois and Luis Serrano   ***
   ********************************************

1 models read: 3ned_Repair.pdb

BackHbond       =               -178.70
SideHbond       =               -76.61
Energy_VdW      =               -267.80
Electro         =               -13.75
Energy_SolvP    =               374.21
Energy_SolvH    =               -351.07
Energy_vdw

## Checking the results

After optimization, the results are stored inside `solver.history`, which is a dictionary with `"x"` and `"y"` keys. Let's check what the best optimization result was:

In [13]:
import numpy as np

best_stability = np.max(solver.history["y"])

inverse_alphabet = {v: k for k, v in problem_info.alphabet.items()}
best_sequence_as_ints = solver.history["x"][np.argmax(solver.history["y"])].flatten()
best_sequence = "".join([inverse_alphabet[x_i] for x_i in best_sequence_as_ints])

print(f"All y values: {solver.history['y']}")
print(f"best stability: {best_stability}")
print(f"Associated sequence: {best_sequence}")

All y values: [array([[9.41639]]), array([[10.2741]]), array([[7.50838]]), array([[7.33792]])]
best stability: 10.2741
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIRDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGEMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK
