# Overview
In this notebook, we apply a parameter estimation method on Kholodenko's model of EGFR signalling pathway. This involves performing parameter estimation, then assessing the quality of the optimized model.

The model that is taken from [Kholodenko et. al (1999)](https://www.sciencedirect.com/science/article/pii/S0021925819518804), and has already been implemented in the rule-based modeling using BioNetGen and also in the
systems biology model specification format (SBML).

The model in Step 4 recovers the Kholodenko's output, but the results of the Kholodenko's model do not fit the experimental data very well, as we can see in the following figure. As a result, in order to discover a set of optimized parameters, I employed a parameter estimation method for the model in Step 4. 

The set of optimized parameters seeks to provide outputs which best follow the experimental data in Kholodenko's paper. 
</ul>
In this figure, you can see the Kholodenko's model output vs experimental data:


</ul>
<br>

![Image1](ORIGINALKHOLO.png)
<br><br>

## Problem Specification, Import, and Setup

In [None]:
import logging
import matplotlib.pyplot as plt
import numpy as np
import petab
import pypesto
import pypesto.optimize as optimize
import pypesto.petab
import pypesto.sample as sample
import pypesto.visualize as visualize
import amici
from pypesto.store import read_from_hdf5, save_to_hdf5

The parameter estimation problem has also already been formulated in [PEtab](https://github.com/PEtab-dev/PEtab) [3].
The PEtab format is compatible with a variety of tools that are primarily developed within the systems biology community. Here, the [pyPESTO](https://github.com/ICB-DCM/pyPESTO) tool is for parameter estimation. 

The default simulation tool used by pyPESTO is [AMICI](https://github.com/AMICI-dev/AMICI).



In [None]:
petab_problem = petab.Problem.from_yaml(
    
   "../EGFR/EGFR.yaml"    #state the exact folder contains the yaml file
)
importer = pypesto.petab.PetabImporter(petab_problem)
# import to pypesto
problem = importer.create_problem()
model = importer.create_model(verbose=False)

## Parameter Estimation

A multi-start optimization is used here, to efficiently explore the parameter space for optima. The author' experience with the difficulty of optimizing this problem led her to use the [Fides](https://github.com/fides-dev/fides) optimizer.
The choice of number of starts is problem-specific.

In [None]:
# create optimizer object which contains all information for doing the optimization
options = {'maxiter':2000}
optimizer = optimize.FidesOptimizer(options=options)
engine = pypesto.engine.MultiProcessEngine()

# do the optimization
result = optimize.minimize(
    problem=problem, optimizer=optimizer, n_starts=50, engine=engine
)

## Visualization and Assessment of Optimization

The first plot here is the waterfall plot, which shows the likelihood function value of the estimated parameter values at the end of each optimization run (start). The runs are ordered by likelihood function value. Generally, a plateau of a few starts suggests a successful optimization with the good optima found. 

If the waterfall plot does not show a plateau at the minimum, the bounds can be adjusted (preferably to more realistic bounds), or the optimization can be run with a higher number of starts, or a different optimization method.

In [None]:
visualize.waterfall(result)

<br>

![Image1](Waterfall.png)
<br><br>

The second plot is the parameters plot, which shows the estimated parameter values for each parameter at the end of each start. The vector of parameter values from a single start is indicated by connected dots. 


In [None]:
visualize.parameters(result)

<br>

![Image1](Parameters.png)
<br><br>

We can also conveniently visualize the model fit. This plots the petab visualization using optimized parameters.

In [None]:
from pypesto.visualize.model_fit import visualize_optimized_model_fit
pp1 = visualize_optimized_model_fit(petab_problem=petab_problem, result=result)

</ul>
<br>

![Image1](Optimization.png)
<br><br>

# Assessment of Maximum Likelihood Estimate

Once optimization appears successful, the maximum likelihood estimate (MLE) can be assessed for parameter and prediction uncertainty.

Parameter uncertainty can be assessed with MCMC sampling.

## MCMC Sampling

MCMC sampling is a method of analysing the uncertainty of a parameter estimate. Here, the adaptive Metropolis-Hastings algorithm is used, with parallel tempering.

In [None]:
mle = result.optimize_result.list[0]['x'] # Maximum likelihood from optimization

sampler = sample.AdaptiveParallelTemperingSampler(
    internal_sampler=sample.AdaptiveMetropolisSampler(), n_chains=3
)

result = sample.sample(
    problem,
    n_samples=10000,
    sampler=sampler,
    x0 = mle
)
elapsed_time = result.sample_result.time
print(f"Elapsed time: {round(elapsed_time,2)}")

To visualize the result, we can plot e.g. kernel density estimates or histograms.

In [None]:
for i_chain in range(len(result.sample_result.betas)):
    pypesto.visualize.sampling_1d_marginals(
        result, i_chain=i_chain, suptitle=f"Chain:{i_chain}", size = (40,32)
    )

### The samples of the first chain 


![Image1](MCMCchains_0.png)
<br><br>

### The samples of the second chain 


![Image1](MCMCchains_1.png)
<br><br>

### The samples of the third chain


![Image1](MCMCchains_2.png)
<br><br>

Next, also the log posterior trace can be visualized.

In [None]:
for i_chain in range(len(result.sample_result.betas)):
    plt.figure()
    plt.plot(np.log10(result.sample_result.trace_neglogpost[i_chain]),'go')

### The log posterior of the first chain
x and y-axis show the iterations and log posterior of function values, respectively.


![Image1](logfvaltrace_0.png)
<br><br>

### The log posterior of the second chain


![Image1](logfvaltrace_1.png)
<br><br>

### The log posterior of the third chain


![Image1](logfvaltrace_2.png)
<br><br>

The parameter trajectories can alse be visualized:

In [None]:
for i_chain in range(len(result.sample_result.betas)):
     visualize.sampling_parameter_traces(
        result, i_chain=i_chain, size = (40,32)
    )

### The parameter trajectories of the first chain

![Image1](parametertrace_0.png)
<br><br>

### The parameter trajectories of the second chain

![Image1](parametertrace_1.png)
<br><br>

### The parameter trajectories of the third chain

![Image1](parametertrace_2.png)
<br><br>

We can also plot approximate confidence intervals based on MCMC chains:

In [None]:
alpha = [99, 95, 90]
ax = visualize.sampling_parameter_cis(result, alpha=alpha, size=(25, 30), step =0.1)

![Image1](CIS.png)
<br><br>

# MCMC sampling diagnostics

## Prediction Uncertainties Observables

To assess prediction uncertainty, the MCMC samples are simulated as an ensemble, and state and observable trajectories for each sample are saved. Percentiles are then computed based on the ensemble predictions.
In this part we illustrate how to assess the quality of the MCMC samples. 

Predictions can be performed by creating a parameter ensemble from the sample, then applying a predictor to the ensemble. The predictor requires a simulation tool. Here, AMICI is used. First, the predictor is setup.

In [None]:
from pypesto.C import AMICI_STATUS, AMICI_T, AMICI_X, AMICI_Y
from pypesto.predict import AmiciPredictor
from pypesto.C import EnsembleType
import numpy as np
from pypesto.ensemble import Ensemble
# such that the output is compatible with the next steps.
def post_processor(amici_outputs, output_type, output_ids):
    outputs = [
        amici_output[output_type]
        if amici_output[AMICI_STATUS] == 0
        else np.full((len(amici_output[AMICI_T]), len(output_ids)), np.nan)
        for amici_output in amici_outputs
    ]
    return outputs


# Setup post-processors for both states and observables.
from functools import partial

amici_objective = result.problem.objective
state_ids = amici_objective.amici_model.getStateIds()
observable_ids = amici_objective.amici_model.getObservableIds()
post_processor_x = partial(
    post_processor,
    output_type=AMICI_X,
    output_ids=state_ids,
)
post_processor_y = partial(
    post_processor,
    output_type=AMICI_Y,
    output_ids=observable_ids,
)

# Create pyPESTO predictors for states and observables
predictor_x = AmiciPredictor(
    amici_objective,
    post_processor=post_processor_x,
    output_ids=state_ids,
)
predictor_y = AmiciPredictor(
    amici_objective,
    post_processor=post_processor_y,
    output_ids=observable_ids,
)

Next, the ensemble is created.

In [None]:
from pypesto.C import EnsembleType
from pypesto.ensemble import Ensemble

# corresponds to only the estimated parameters
x_names = result.problem.get_reduced_vector(result.problem.x_names)

# Create the ensemble with the MCMC chain from parallel tempering with the real temperature.
ensemble = Ensemble.from_sample(
    result,
    x_names=x_names,
    ensemble_type=EnsembleType.sample,
    lower_bound=result.problem.lb,
    upper_bound=result.problem.ub,
)

The predictor is then applied to the ensemble to generate predictions in a limited time points in which we have experimental data, then plots the results.

In [None]:
from pypesto.engine import MultiThreadEngine
engine = MultiThreadEngine()

ensemble_prediction = ensemble.predict(
    predictor_x, prediction_id=AMICI_X, engine=engine
)
# Create the ensemble with the MCMC chain from parallel tempering with the real temperature.
ensemble = Ensemble.from_sample(
    result,
    x_names=x_names,
    ensemble_type=EnsembleType.sample,
    lower_bound=result.problem.lb,
    upper_bound=result.problem.ub,
)

ensemble_prediction = ensemble.predict(
    predictor_y, prediction_id=AMICI_Y, engine=engine
)

In [None]:
# Plotting based on each experimental condition
## Condition 0 is initial concentration for EGF = 0.2 nM
## Condition 1 is initial concentration for EGF = 2 nM
## Condition 2 is initial concentration for EGF = 20 nM

from pypesto.C import CONDITION, OUTPUT
credibility_interval_levels = [90, 95, 99]
ax = visualize.sampling_prediction_trajectories(
    ensemble_prediction,
    levels=credibility_interval_levels,
    size=(15, 9),
    axis_label_padding=60,
    groupby=CONDITION,
)

Black lines in each plot shows the observables which obtained using maximum likelihood, and the color shadows show the observables which obtained applying the MCMC samples as an ensemble to show the prediction.

![Image1](observable_condition.png)
<br><br>

In [None]:
# Plotting based on each observable
ax = visualize.sampling_prediction_trajectories(
    ensemble_prediction,
    levels=credibility_interval_levels,
    size=(15, 9),
    axis_label_padding=60,
    groupby=OUTPUT,
    reverse_opacities=True,
)

![Image1](observable_output.png)
<br><br>

Custom timepoints can also be specified, either for each condition:

In [None]:
# Create a custom objective with new output timepoints.
timepoints = [np.linspace(0, 120, 121), np.linspace(0, 120, 121), np.linspace(0, 120, 121)]
amici_objective_custom = amici_objective.set_custom_timepoints(
    timepoints=timepoints
)

# Create an observable predictor with the custom objective.
predictor_y_custom = AmiciPredictor(
    amici_objective_custom,
    post_processor=post_processor_y,
    output_ids=observable_ids,
)

# Predict then plot.
ensemble_prediction = ensemble.predict(
    predictor_y_custom, prediction_id=AMICI_Y, engine=engine
)

In [None]:
ax = visualize.sampling_prediction_trajectories(
    ensemble_prediction,
    levels=credibility_interval_levels,
    groupby=CONDITION,
    size=(15, 9))

![Image1](simulated_condition.png)
<br><br>

In [None]:
ax = visualize.sampling_prediction_trajectories(
    ensemble_prediction,
    levels=credibility_interval_levels,
    groupby=OUTPUT,
    size=(15, 9))

![Image1](simulated_output.png)
<br><br>