# Result Storage

This notebook will show how to store pypesto result objects to be able to load them later on for visualization and further analysis.
This includes sampling, profiling and optimization. Additionally, we will show how to use optimization history to look further into an optimization run and how to store the history.

After this notebook, you will...

* know how to store and load optimization, profiling and sampling results
* know how to store and load optimization history
* know basic plotting functions for optimization history to inspect optimization convergence

In [None]:
# install if not done yet
# %pip install pypesto --quiet

### Imports

In [None]:
import logging
import random
import tempfile

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import Markdown, display

import pypesto.optimize as optimize
import pypesto.petab
import pypesto.profile as profile
import pypesto.sample as sample
import pypesto.visualize as visualize

mpl.rcParams['figure.dpi'] = 100
mpl.rcParams['font.size'] = 18
# set a random seed to get reproducible results
random.seed(3142)

%matplotlib inline

## 0. Objective function and problem definition

We will use the Boehm model from the [benchmark initiative](https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab) in this notebook as an example.
We load the model through [PEtab](https://petab.readthedocs.io/en/latest/), a data format for specifying parameter estimation problems in systems biology.

In [None]:
%%capture
# directory of the PEtab problem
petab_yaml = './boehm_JProteomeRes2014/boehm_JProteomeRes2014.yaml'

importer = pypesto.petab.PetabImporter.from_yaml(petab_yaml)
problem = importer.create_problem(verbose=False)

## 1. Filling in the result file

We will now run a standard parameter estimation pipeline with this model. Aside from the part on the history, we shall not go into detail here,
as this is covered in other tutorials such as [Getting Started](getting_started.ipynb) and [AMICI in pyPESTO](amici.ipynb).

### Optimization

In [None]:
%%time

# create optimizers
optimizer = optimize.FidesOptimizer(
    verbose=logging.ERROR, options={"maxiter": 200}
)

# set number of starts
n_starts = 15  # usually a larger number >=100 is used

# Optimization
result = pypesto.optimize.minimize(
    problem=problem, optimizer=optimizer, n_starts=n_starts
)

In [None]:
display(Markdown(result.summary()))

### Profiling

In [None]:
%%time

# Profiling
result = profile.parameter_profile(
    problem=problem,
    result=result,
    optimizer=optimizer,
    profile_index=np.array([1, 1, 1, 0, 0, 0, 0, 0, 1]),
)

### Sampling

In [None]:
%%time

# Sampling
sampler = sample.AdaptiveMetropolisSampler()
result = sample.sample(
    problem=problem,
    sampler=sampler,
    n_samples=5000,  # rather low
    result=result,
    filename=None,
)

## 2. Storing the result file

We filled all our analyses into one result file. We can now store this result object into HDF5 format to reload this later on.

In [None]:
# create temporary file
fn = tempfile.mktemp(".hdf5")

# write the result with the write_result function.
# Choose which parts of the result object to save with
# corresponding booleans.
pypesto.store.write_result(
    result=result,
    filename=fn,
    problem=True,
    optimize=True,
    profile=True,
    sample=True,
)

As easy as we can save the result object, we can also load it again:

In [None]:
# load result with read_result function
result_loaded = pypesto.store.read_result(fn)

As you can see, when loading the result object, we get a warning regarding the problem. This is the case, as the problem is not fully saved into hdf5, as a big part of the problem is the objective function. Therefore, after loading the result object, we cannot evaluate the objective function anymore. We can, however, still use the result object for plotting and further analysis.

The best practice would be to still create the problem through petab and insert it into the result object after loading it.

In [None]:
# dummy call to non-existent objective function would fail
test_parameter = result.optimize_result[0].x[problem.x_free_indices]
# result_loaded.problem.objective(test_parameter)

In [None]:
result_loaded.problem = problem
print(
    f"Objective function call: {result_loaded.problem.objective(test_parameter)}"
)
print(f"Corresponding saved value: {result_loaded.optimize_result[0].fval}")

To show that for visualizations however, the storage and loading of the result object is accurate, we will plot some result visualizations.

## 3. Visualization Comparison

### Optimization

In [None]:
# waterfall plot original
ax = visualize.waterfall(result)
ax.title.set_text("Original Result")

In [None]:
# waterfall plot loaded
ax = visualize.waterfall(result_loaded)
ax.title.set_text("Loaded Result")

### Profiling

In [None]:
# profile plot original
ax = visualize.profiles(result)

In [None]:
# profile plot loaded
ax = visualize.profiles(result_loaded)

### Sampling

In [None]:
# sampling plot original
ax = visualize.sampling_fval_traces(result)

In [None]:
# sampling plot loaded
ax = visualize.sampling_fval_traces(result_loaded)

We can see that we are perfectly able to reproduce the plots from the loaded result object. With this, we can reuse the result object for further analysis and visualization again and again without spending time and resources on rerunning the analyses.

## 4. Optimization History

During optimization, it is possible to regularly write the objective function trace to file. This is useful, e.g., when runs fail, or for various diagnostics. Currently, pyPESTO can save histories to 3 backends: in-memory, as CSV files, or to HDF5 files.

### Memory History

To record the history in-memory, just set `trace_record=True` in the `pypesto.HistoryOptions`. Then, the optimization result contains those histories:

In [None]:
# record the history
history_options = pypesto.HistoryOptions(trace_record=True)

# Run optimizations
result = optimize.minimize(
    problem=problem,
    optimizer=optimizer,
    n_starts=n_starts,
    history_options=history_options,
    filename=None,
)

Now, in addition to queries on the result, we can also access the history.

In [None]:
print("History type: ", type(result.optimize_result.list[0].history))
# print("Function value trace of best run: ", result.optimize_result.list[0].history.get_fval_trace())

fig, ax = plt.subplots(1, 2)
visualize.waterfall(result, ax=ax[0])
visualize.optimizer_history(result, ax=ax[1])
fig.set_size_inches((15, 5))

### CSV History

The in-memory storage is, however, not stored anywhere. To do that, it is possible to store either to CSV or HDF5. This is specified via the `storage_file` option. If it ends in `.csv`, a `pypesto.objective.history.CsvHistory` will be employed; if it ends in `.hdf5` a `pypesto.objective.history.Hdf5History`. Occurrences of the substring `{id}` in the filename are replaced by the multistart id, allowing to maintain a separate file per run (this is necessary for CSV as otherwise runs are overwritten).

In [None]:
# create temporary file
fn_csv = tempfile.mktemp("_{id}.hdf5")
# record the history and store to CSV
history_options = pypesto.HistoryOptions(
    trace_record=True, storage_file=fn_csv
)

# Run optimizations
result = optimize.minimize(
    problem=problem,
    optimizer=optimizer,
    n_starts=n_starts,
    history_options=history_options,
    filename=None,
)

Note that for this simple cost function, saving to CSV takes a considerable amount of time. This overhead decreases for more costly simulators, e.g., using ODE simulations via AMICI.

In [None]:
print("History type: ", type(result.optimize_result.list[0].history))
# print("Function value trace of best run: ", result.optimize_result.list[0].history.get_fval_trace())

fig, ax = plt.subplots(1, 2)
visualize.waterfall(result, ax=ax[0])
visualize.optimizer_history(result, ax=ax[1])
fig.set_size_inches((15, 5))

### HDF5 History

Just as in CSV, writing the history to HDF5 takes a considerable amount of time.
If a user specifies a HDF5 output file named `my_results.hdf5` and uses a parallelization engine, then:
* a folder is created to contain partial results, named `my_results/` (the stem of the output filename)
* files are created to store the results of each start, named `my_results/my_results_{START_INDEX}.hdf5`
* a file is created to store the combined result from all starts, named `my_results.hdf5`.
Note that this file depends on the files in the `my_results/` directory, so **cease to function** if
`my_results/` is deleted.

In [None]:
# create temporary file
fn_hdf5 = tempfile.mktemp(".hdf5")
# record the history and store to CSV
history_options = pypesto.HistoryOptions(
    trace_record=True, storage_file=fn_hdf5
)

# Run optimizations
result = optimize.minimize(
    problem=problem,
    optimizer=optimizer,
    n_starts=n_starts,
    history_options=history_options,
    filename=fn_hdf5,
)

In [None]:
print("History type: ", type(result.optimize_result.list[0].history))
# print("Function value trace of best run: ", result.optimize_result.list[0].history.get_fval_trace())

fig, ax = plt.subplots(1, 2)
visualize.waterfall(result, ax=ax[0])
visualize.optimizer_history(result, ax=ax[1])
fig.set_size_inches((15, 5))

For the HDF5 history, it is possible to load the history from file, and to plot it, together with the optimization result.

In [None]:
# load the history
result_loaded_w_history = pypesto.store.read_result(fn_hdf5)

fig, ax = plt.subplots(1, 2)
visualize.waterfall(result_loaded_w_history, ax=ax[0])
visualize.optimizer_history(result_loaded_w_history, ax=ax[1])
fig.set_size_inches((15, 5))