Numerical structural identifiability analysis #196

formersbach · 2022-09-22T09:55:02Z

I believe identifiability analysis is a very useful tool in model development and experiment design!
Based on the work of Joubert and Stigter I have implemented the numerical part of their algorithm in a proof of concept script.
Structural identifiability is a structural property of any model. To quote Joubert et al:
"One way of characterizing structural identifiability is to say that at least 1 parameter has a confidence interval that spans the interval negative infinity to positive infinity. Any form of unidentifiability [...] calls into question the predictive power of a model and urges its user to interpret all results with caution"
The idea of their algorithm is to analyze the so called parametric output sensitivity matrix (OSM). Full rank of the OSM is a sufficient condition for structural identifiability. Rank deficiency indicates non-influence by parameters or correlation between parameters.
The rank of the OSM is calculated using singular value decomposition. Since all calculations are numerical and non-exhaustive there is a non-zero possibility of false negatives (identifiable parameter is deemed unidentifiable).

In principle no user input is required. A user defined observable matrix can be supplied however.

The script is ~ 100 lines long and requires the sympy package for symbolic calculations.

The only information required is a Text2Model object.
Where in Pasmopy/Biomass do you think this should go? Should I just provide you the proof of concept script?

himoto · 2022-09-22T10:37:30Z

Hi @formersbach, sounds fantastic to me!
Regarding implementation, it would be nice to add your script to text2model.py as a method of Text2Model class.
Or you can create something like identifiability_analysis.py in construction/ folder and implement as follows:

from .text2model import Text2Model

# Example
class IdentifiabilityAnalysis(Text2Model):
    ...

Anyways it depends on the usage.
If you need sympy, please add it to requirements.txt.
Can you please show me a usage example of it in the reply of this or Show and tell in a Discussion?

formersbach · 2022-09-28T09:42:49Z

Hey @himoto !
This is the proof of concept script for identifiability:

from biomass import Text2Model
import sympy as sym
import numpy as np
from scipy.integrate import solve_ivp
import os
import re
import matplotlib.pyplot as plt

class ident_analysis:
    def __init__(self, model_file) -> None:
        self.model = Text2Model(model_file)
        self.model.convert(overwrite=True)
        self.symbol_list = self.create_symbols()
        self.v = self.get_v()
        self.N = sym.Matrix(self.model.stoichiometry_matrix.toarray())
        self.dydt = self.N * self.v

    def create_symbols(self):
        #Creates list of sympy symbols for integration
        temp_text = "import sympy\nsymbol_list=[\n"
        for species in self.model.species:
            temp_text += f"\tsympy.Symbol(\"{species}\"),\n"
        for parameter in self.model.parameters:
            temp_text += f"\tsympy.Symbol(\"{parameter}\"),\n"
        temp_text += "]"
        with open("tmp_symbols.py", "w") as tmp:
            tmp.writelines(temp_text)
        try:
            from tmp_symbols import symbol_list
        except ImportError:
            raise ImportError("The temporary symbols file could not be created.")
        os.remove("tmp_symbols.py")
        return symbol_list

    def get_v(self):
        #Turn reaction velocities to sympy equation vector
        v = []
        for reaction in self.model.kinetics:
            v += [sym.parsing.sympy_parser.parse_expr(reaction.rate)]
        return sym.Matrix(v)

    def get_values(self, perturbation, t_steps=100, perturb_species=None):
        #Performs the integration by substituting numerical values into the equations
        params, y0 = [], []
        for symbol in self.symbol_list:
            value=1
            default_values = self.model.param_info + self.model.init_info
            for default in default_values:
                if str(symbol) in default:
                    value = float(default.split("=")[1].strip())
            if str(symbol) == perturb_species:
                value += perturbation
            if str(symbol) in self.model.species:
                y0.append(value)
            elif str(symbol) in self.model.parameters:
                params.append([symbol, value])
        system = self.dydt.subs(params)
        def int_step(t, y):
            assert len(y) == len(self.model.species)
            replace = list(zip(self.symbol_list[:len(self.model.species)], y))
            return np.array(system.subs(replace)).flatten().astype(np.float64)
        t_eval = np.linspace(0, 100, num=t_steps, endpoint=False)
        integrated = solve_ivp(int_step, t_span=(0, 100), y0=y0, t_eval=t_eval)
        assert integrated.y.shape == (len(self.model.species), len(t_eval))
        return integrated

    def default_output_matrix(self):
        # Generates a default output matrix from the observable description
        observables = self.model.obs_desc
        output = np.zeros((len(observables), len(self.model.species)))
        indx_dict = {species: indx for indx, species in enumerate(self.model.species)}
        for i, observable in enumerate(observables):
            assert len(observable) == 2
            _, equation = observable
            re_find_u = r"(?<=u\[)(.+?)(?=\])"
            summands = re.split(r"\+|\-",equation)
            for summand in summands:
                spec = re.findall(re_find_u, summand)
                assert len(spec) == 1, "Observable equation is non-linear."
                spec = spec[0]
                output[i, indx_dict[spec]] = 1
        return output

    def apply_output(self, raw_sim, output_matrix=None):
        # Drops the non-observed species concentrations
        if output_matrix is None:
            output_matrix = self.default_output_matrix()
        return np.matmul(output_matrix, raw_sim.y)

    def run_perturbation(self, output_matrix=None, init_cond=True, const=None, perturbation=0.001, t_steps=100):
        # Performs unperturbed integration, perturbed integration and calculates sensitivities
        if init_cond:
            perturb = self.model.species + self.model.parameters
        else:
            perturb = self.model.parameters
        if const:
            perturb = [element for element in perturb if element not in const]
        base = self.apply_output(self.get_values(perturbation=perturbation, t_steps=t_steps))
        if output_matrix:
            num_obs = output_matrix.shape[0]
        else:
            num_obs = self.default_output_matrix().shape[0]
        sensitivity = np.zeros((num_obs*t_steps, len(perturb)))
        for i, param in enumerate(perturb):
            res = self.apply_output(self.get_values(perturbation, t_steps=t_steps, perturb_species=param))
            sensitivity[:, i] = ((res - base)/perturbation).flatten()
        assert sensitivity.shape == (num_obs * t_steps, len(perturb))
        return sensitivity

    def analyze_sensitivities(self):
        # Singular value decomposition of sensitivity matrix
        sensitivity = self.run_perturbation()
        U, E, Vstar = np.linalg.svd(sensitivity)
        return U, E, Vstar.T

def plot_sing_values(E):
    # Plots log10 transformed singular values
    log10E = np.log10(E)
    x = list(range(1, len(E) + 1))
    plt.stem(x, log10E, use_line_collection=True)
    plt.savefig("Singular_values")
    return

def plot_unid_components(V, index, ident_analys):
    # Plots given column of V
    column_vec = V[:, index]
    x = [str(symb) for symb in ident_analys.symbol_list]
    plt.stem(x, column_vec, use_line_collection=True)
    plt.savefig("Column_vector")

if __name__ == "__main__":
    import time
    start = time.time()
    test = ident_analysis("test.txt")
    U, E, V = test.analyze_sensitivities()
    end = time.time()
    print(f"Analysis took {end - start} seconds.")
    # plot_sing_values(E)
    #or
    # plot_unid_components(V, 7, test)

You can try it out with this simple unidentifiable text model:

A --> B
B --> P
A --> C
C --> P

@obs A: u[A]
@obs P: u[P]

If you add an observation for C the unidentifiability is resolved.

formersbach · 2023-01-20T10:06:54Z

Hey @himoto!
I would like to close this issue, since it's currently not planned to add this to biomass right?
In any case discussions is more fitting for this, sorry for opening it as an issue!

himoto · 2023-01-20T10:24:43Z

No problem.
New idea to improve BioMASS is always more than welcome!
I would like to transfer this issue to Discussions.

formersbach added enhancement New feature or request text2model Text-to-model conversion labels Sep 22, 2022

himoto assigned himoto and formersbach and unassigned himoto Sep 23, 2022

biomass-dev locked and limited conversation to collaborators Jan 20, 2023

himoto converted this issue into discussion #234 Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Numerical structural identifiability analysis #196

Numerical structural identifiability analysis #196

formersbach commented Sep 22, 2022

himoto commented Sep 22, 2022

formersbach commented Sep 28, 2022

formersbach commented Jan 20, 2023

himoto commented Jan 20, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Numerical structural identifiability analysis #196

Numerical structural identifiability analysis #196

Comments

formersbach commented Sep 22, 2022

himoto commented Sep 22, 2022

formersbach commented Sep 28, 2022

formersbach commented Jan 20, 2023

himoto commented Jan 20, 2023

This issue was moved to a discussion.