# Optimizing protein stability using random mutations

In this example, we optimize the thermal stability of mutations from a wildtype protein. To do so, we use the `foldx_stability` problem.

:::{warning}

In the particular case of `foldx`-related black boxes, you will need to have it properly installed. [Check our documentation on installing foldx](../../../understanding_foldx/00-installing-foldx.md).

You can also install all of the dependencies to run it using

```
pip install poli-core[foldx]
```

If you have done everything correclty, you should be able to run

```bash
~/foldx/foldx --version
```

:::

## Optimizing `mRouge`

In this example, we will focus on optimizing [`mRouge`, also known as `3NED`](https://www.rcsb.org/structure/3NED), one of the red fluorescent proteins explored in `LaMBO` {cite:p}`stanton2022accelerating`. Before optimization, we need to download the file and "repair" it (see [single mutations using foldx](../../understanding_foldx/01-single-mutation-using-foldx/index.ipynb)).

We assume that the repaired file is already here.

In [1]:
!ls

3ned_Repair.pdb                    optimizing_protein_stability.ipynb


In [2]:
from pathlib import Path

wildtype_pdb_path = Path("./3ned_Repair.pdb").resolve()
wildtype_pdb_path.exists()  # Should say True

True

## Defining the objective function

In this tutorial, we optimize the stability of `mRogue` using the `foldx_stability` black box. The first step is creating it:

In [3]:
from poli.objective_repository import FoldXStabilityProblemFactory

problem_factory = FoldXStabilityProblemFactory()

problem = problem_factory.create(
    wildtype_pdb_path=wildtype_pdb_path
)
f, x0 = problem.black_box, problem.x0

poli 🧪: Starting the function foldx_stability as an isolated process.


`problem_factory.create` returns a `Problem` instance. Problems have the following useful attributes:

1. a black-box function in `problem.black_box`. In this case, it is a `FoldXStabilityBlackBox`.
2. an initial design  in `problem.x0: np.ndarray`, and
3. All the relevant information about the black box (e.g. whether it's deterministic, what the alphabet is...) in `problem.info: BlackBoxInformation`.

These are all the ingredients required for an abstract solver to work. The next section shows how to use a baseline solver, which can be easily replaced by any other solver you implement (as long as it inherits from the `AbstractSolver` in `poli_baselines.core.abstract_solver`).

## Optimizing using a `RandomMutation` solver

In this tutorial we use the simplest baseline for discrete sequence optimization: a `RandomMutation` which takes the best performing sequence and randomly mutates it by selecting a position at random, and altering for another element of the alphabet.

:::{note}

There's nothing special about `RandomMutation` here. You could drop-in any solver you implement as long as it

1. Inherits from `AbstractSolver` in `poli_baselines.core.abstract_solver`, and it
2. implements the abstract method `next_candidate() -> np.ndarray`.

[Check this tutorial on creating solvers for more details](../../the_basics/defining_a_problem_solver.md).

:::

In [4]:
from poli_baselines.solvers.simple.random_mutation import RandomMutation

y0 = f(x0)

solver = RandomMutation(
    black_box=f,
    x0=x0,
    y0=y0,
)

**And that's it!** You can optimize the objective function passed as `black_box` by just calling the `.solve(n_iters)` method: (be careful, this might take a while)

In [5]:
solver.solve(max_iter=3)

(array([['E', 'E', 'D', 'N', 'M', 'A', 'I', 'I', 'K', 'E', 'F', 'M', 'R',
         'F', 'K', 'T', 'H', 'M', 'E', 'G', 'S', 'V', 'N', 'G', 'H', 'E',
         'F', 'E', 'I', 'E', 'G', 'E', 'G', 'E', 'G', 'R', 'P', 'Y', 'E',
         'G', 'T', 'Q', 'T', 'A', 'K', 'L', 'K', 'V', 'T', 'K', 'G', 'G',
         'P', 'L', 'P', 'F', 'A', 'W', 'D', 'I', 'L', 'S', 'P', 'Q', 'F',
         'S', 'K', 'A', 'Y', 'V', 'K', 'H', 'P', 'A', 'D', 'I', 'P', 'D',
         'Y', 'L', 'K', 'L', 'S', 'F', 'P', 'E', 'G', 'F', 'K', 'W', 'E',
         'R', 'V', 'M', 'N', 'F', 'E', 'D', 'G', 'G', 'V', 'V', 'T', 'V',
         'T', 'Q', 'D', 'S', 'S', 'L', 'Q', 'D', 'G', 'E', 'F', 'I', 'Y',
         'K', 'V', 'K', 'L', 'R', 'G', 'T', 'N', 'F', 'P', 'S', 'D', 'G',
         'P', 'V', 'M', 'Q', 'K', 'K', 'T', 'M', 'G', 'W', 'E', 'A', 'C',
         'S', 'E', 'R', 'M', 'Y', 'P', 'E', 'D', 'G', 'A', 'L', 'K', 'G',
         'E', 'M', 'K', 'M', 'R', 'L', 'K', 'L', 'K', 'D', 'G', 'G', 'H',
         'Y', 'D', 'A', 'E', 'V', 'K',

## Checking the results

After optimization, the results are stored inside `solver.history`, which is a dictionary with `"x"` and `"y"` keys. Let's check what the best optimization result was:

In [6]:
print(f"All y values: {solver.history['y']}")
print(f"best stability: {solver.get_best_performance()}")
print(f"Associated sequence: {''.join(solver.get_best_solution().flatten())}")

All y values: [array([[9.41639]]), array([[9.41639]]), array([[8.26382]]), array([[6.19869]])]
best stability: [9.41639]
Associated sequence: EEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEACSERMYPEDGALKGEMKMRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNTNTKLDITSHNEDYTIVEQYERNEGRHSTGGMDELYK
