In [1]:
from pyenzyme import EnzymeMLDocument, Reactant, Replicate

# EnzymeML application: From STRENDA-DB to COPASI modeling

This notebook demonstrates an examplatory usage of the PyEnzyme Thin Layers that were reported in [Range et al. 2021](https://doi.org/10.1111/febs.16318) including the conversion of a STRENDA-DB entry to EnzymeML and time-course simulation followed by a parameter estimation using COPASI. In addition, this notebook covers editing of an EnzymeML document, since the STRENDA-DB entry lacks several some information and to demonstrate how an existing EnzymeML document can be enriched with meta- and time-course data.

## 1. Creating EnzymeML documents from STRENDA DB entries

STRENDA DB is a database on enzyme-catalyzed reactions, which covers the most important information on reaction conditions and kinetic parameters. The API was used to create an EnzymeML document from a STRENDA DB entry via a STRENDA DB-specific thin API layer (TL_STRENDA) to the object layer using the PyEnzyme library. The Thin Layer "TL_Strenda" is hosted as a module of PyEnzyme and can be used as a method to process any given STRENDA-DB XML format. For this, an example dataset was previously downloaded from STRENDA-DB's [query page](https://www.beilstein-strenda-db.org/strenda/public/query.xhtml) and added to this directory. In order to execute the conversion, the ```path``` to the XML file as well as ```out_dir``` for the target directory the Thin Layer will write to are specified in a call to ```ThinLayerStrendaML.toEnzymeML```.

In [2]:
from pyenzyme.thinlayers import ThinLayerStrendaML

Matplotlib backend set to: "nbAgg"
Matplotlib interface loaded (pysces.plt.m)
Pitcon routines available
NLEQ2 routines available
SBML support available
You are using NumPy (1.21.2) with SciPy (1.7.0)
RateChar is available

No module named 'ipyparallel'
INFO: Parallel scanner not available

PySCeS environment
******************
pysces.model_dir = /Users/janrange/Pysces/psc
pysces.output_dir = /Users/janrange/Pysces


***********************************************************************
* Welcome to PySCeS (1.0.0) - Python Simulator for Cellular Systems   *
*                http://pysces.sourceforge.net                        *
* Copyright(C) B.G. Olivier, J.M. Rohwer, J.-H.S. Hofmeyr, 2004-2022  *
* Triple-J Group for Molecular Cell Physiology                        *
* Stellenbosch University, ZA and VU University Amsterdam, NL         *
* PySCeS is distributed under the PySCeS (BSD style) licence, see     *
* LICENCE.txt (supplied with this release) for details                *
* Pl

In [3]:
# Convert the STRENDA-DB XML to EnzymeML
ThinLayerStrendaML.toEnzymeML(
    path="./STRENDA/3IZNOK_TEST.xml",
    out_dir="./STRENDA/generated/"
)


Archive was written to STRENDA/generated/3IZNOK_TEST.omex



## 2. Editing of EnzymeML: simulation of time course data from kinetic parameters

STRENDA-DB entries provide for an enzyme-catalyzed reaction the kinetic parameters KM and kcat assuming a Michaelis–Menten model and the concentration range of the substrate. However, they are lacking information on the product and on the time course of substrate or product concentrations. These will be added to the appropriate fields found in the document and time-course data will be simulated using the measurement setups that were previously extracted from the STRENDA-DB entry.

In [4]:
# Load the EnzymeML document
enzmldoc = EnzymeMLDocument.fromFile("./STRENDA/generated/3IZNOK_TEST.omex")
enzmldoc.printDocument()

3IZNOK_TEST
>>> Reactants
	ID: s0 	 Name: 1H-indole
	ID: s1 	 Name: (2S)-2-amino-3-phosphonooxypropanoic acid
>>> Proteins
	ID: p0 	 Name: TrpB2o from Arabidopsis thaliana
>>> Complexes
>>> Reactions
	ID: r0 	 Name: indole fixed, o-phospho-L-serine varied


#### Adding missing entities and reaction modification

In [5]:
# Add the missing protein sequence
protein = enzmldoc.getProtein("p0")
protein.sequence = """
    MAMRIRIDLPQDEIPAQWYNILPDLPEELPPPQDPTGKSLELLKEVLPSKVLELE
    FAKERYVKIPDEVLERYLQVGRPTPIIRAKRLEEYLGNNIKIYLKMESYTYTGS
    HKINSALAHVYYAKLDNAKFVTTETGAGQWGSSVALASALFRMKAHIFMVRTSY
    YAKPYRKYMMQMYGAEVHPSPSDLTEFGRQLLAKDSNHPGSLGIAISDAVEYAH
    KNGGKYVVGSVVNSDIMFKTIAGMEAKKQMELIGEDPDYIIGVVGGGSNYAALA
    YPFLGDELRSGKVRRKYIASGSSEVPKMTKGVYKYDYPDTAKLLPMLKMYTIGS
    DFVPPPVYAGGLRYHGVAPTLSLLISKGIVQARDYSQEESFKWAKLFSELEGYI
    PAPETSHALPILAEIAEEAKKSGERKTVLVSFSGHGLLDLGNYASVLFKEKLAA
    ALEHHHHHH""".strip()

In [6]:
# Add the missing products using the Chebi ID to the document
product = Reactant.fromChebiID("CHEBI:16828", vessel_id="v0")
coproduct = Reactant.fromChebiID("CHEBI:43474", vessel_id="v0")

product_id = enzmldoc.addReactant(product)
coproduct_id = enzmldoc.addReactant(coproduct)

# Finally, add it as product to the reaction
reaction = enzmldoc.getReaction("r0")

reaction.addProduct(species_id=product_id, stoichiometry=1.0, enzmldoc=enzmldoc)
reaction.addProduct(species_id=coproduct_id, stoichiometry=1.0, enzmldoc=enzmldoc)

# Inspect the reaction scheme for confirmation
enzmldoc.printReactionSchemes()

indole fixed, o-phospho-L-serine varied:
1.0 1H-indole + 1.0 (2S)-2-amino-3-phosphonooxypropanoic acid -> 1.0 L-tryptophan + 1.0 hydrogenphosphate



#### Time-course simulation

This section will utilize the given model and measurement setup to simulate possible time-course data, which will later be used to re-estimate the parameters using COPASI. Since this is a rather trivial demonstration, since the parameters have already been estimated, it shows how a potential situation where parameters have not yet been estimated can be executed towards modeling.

In [7]:
# Get the model and for simulation
model = enzmldoc.getReaction("r0").model

In [8]:
def simulate(substrate_init, protein_init, model, time_steps=range(1,201)):
    """Function used to simulate given measurement"""
    
    substrate_conc, time = [substrate_init], [0]
    
    for time_step in time_steps:
        
        # Evaluate the velocity
        velocity = model.evaluate(
                p0=protein_init,
                s1=substrate_conc[-1]
            )
        
        time.append(time_step)
        substrate_conc.append(
            substrate_conc[-1] + (-1)*velocity*substrate_conc[-1]
        )
        
    return time, substrate_conc

In [9]:
# Iterate through all measurements and append the new replicate data
for measurement in enzmldoc.measurement_dict.values():
    
    # Gather the important concentrations
    protein_conc = measurement.getProtein("p0").init_conc
    substrate_conc = measurement.getReactant("s1").init_conc
    substrate_unit = measurement.getReactant("s1").unit
    
    time, data = simulate(substrate_conc, protein_conc, model)
    
    replicate = Replicate(
        id=f"replicate_meas{measurement.id}",
        species_id="s1",
        data_unit=substrate_unit,
        time_unit="sec",
        data=data,
        time=time
    )
    
    measurement.addReplicates([replicate], enzmldoc)
    
    measurement.printMeasurementScheme()

>>> Measurement m0: measurement_1
    s0 | initial conc: 100.0 uM 	| #replicates: 0
    s1 | initial conc: 0.0 mM 	| #replicates: 1
    p0 | initial conc: 10.0 uM 	| #replicates: 0
>>> Measurement m1: measurement_2
    s0 | initial conc: 100.0 uM 	| #replicates: 0
    s1 | initial conc: 0.1 mM 	| #replicates: 1
    p0 | initial conc: 10.0 uM 	| #replicates: 0
>>> Measurement m2: measurement_3
    s0 | initial conc: 100.0 uM 	| #replicates: 0
    s1 | initial conc: 0.2 mM 	| #replicates: 1
    p0 | initial conc: 10.0 uM 	| #replicates: 0
>>> Measurement m3: measurement_4
    s0 | initial conc: 100.0 uM 	| #replicates: 0
    s1 | initial conc: 0.30000000000000004 mM 	| #replicates: 1
    p0 | initial conc: 10.0 uM 	| #replicates: 0
>>> Measurement m4: measurement_5
    s0 | initial conc: 100.0 uM 	| #replicates: 0
    s1 | initial conc: 0.4 mM 	| #replicates: 1
    p0 | initial conc: 10.0 uM 	| #replicates: 0
>>> Measurement m5: measurement_6
    s0 | initial conc: 100.0 uM 	| #replicate

In [10]:
# Finally, write the new EnzymeML document to a new file
enzmldoc.toFile("./COPASI", name="3IZNOK_Simulated")


Archive was written to COPASI/3IZNOK_Simulated.omex



## 3. Kinetic modeling of EnzymeML data by COPASI

COPASI is a modeling and simulation environment, which supports the OMEX format. Using the PyEnzyme library and a COPASI-specific thin API layer (TL_COPASI), the time course data (measured concentrations of substrate or product) are loaded into COPASI. Within COPASI, different kinetic laws are applied, kinetic parameters are estimated, and plots are generated to assess the result. The selected kinetic model and the estimated kinetic parameters are then added to the EnzymeML document.

... updated version to the new API syntax will follow soon.

In [10]:
enzmldoc.exportMeasurementData()

{'m0': {'data':      time/sec  replicate_measm0/s1/mM
  0         0.0                     0.0
  1         1.0                     0.0
  2         2.0                     0.0
  3         3.0                     0.0
  4         4.0                     0.0
  ..        ...                     ...
  196     196.0                     0.0
  197     197.0                     0.0
  198     198.0                     0.0
  199     199.0                     0.0
  200     200.0                     0.0
  
  [201 rows x 2 columns],
  'initConc': {'s1': (0.0, 'u1')}},
 'm1': {'data':      time/sec  replicate_measm1/s1/mM
  0         0.0                0.100000
  1         1.0                0.086364
  2         2.0                0.074753
  3         3.0                0.064863
  4         4.0                0.056434
  ..        ...                     ...
  196     196.0                0.000406
  197     197.0                0.000403
  198     198.0                0.000401
  199     199.0            