# _PyEnzyme_ - Import template

#### Usage

- This template offers all functionalities to extract information from a given EnzymeML document. 
- Simply reduce the template to your application-specific variables and map these to your own application.

------------------------------

In [1]:
from pyenzyme.enzymeml.tools import EnzymeMLReader

## Read .omex file

- The reader converts the .xml document to an _EnzymeMLDocument_ object
- Entities such as Proteins, Reactants and reactions are stored within dictionaries


    ProteinDict
    ReactantDict
    ReactionDict
    UnitDict
    
- These can be accessed via the "enzmldoc" object by its native methods.

In [2]:
path = "Example.omex"
enzmldoc = EnzymeMLReader().readFromFile(path)

## User information

- Information about the creators of a given EnzymeML document is stored within a list of "Creator" objects.

Attributes:
    - Given name
    - last name
    - E-mail

In [3]:
user_info = enzmldoc.getCreator()

for user in user_info:
    
    given_name = user.getGname()
    family_name = user.getFname()
    mail = user.getMail()
    

## Unit Definitions

- Each unit defined in an _EnzymeMLDocument_ is stored as a _UnitDef_ object in a dictionary, which can be accessed via iteration. 
- These _UnitDef_ objects carry a list of the units which it is made of including their exponent (e.g. mole/l => mol ^ 1, l ^ -1)  

Attributes:
    - ID: Internal identifiers
    - Name: SI name of the unit
    - Meta ID: SBML related identifier
    - Ontology: URL to ontology describing the unit
    - Baseunits: Single units the UnitDef is made of

In [4]:
enzmldoc.printUnits()

>>> Units
    ID: u0 	 Name: ml +1
    ID: u1 	 Name: l -1 mmole +1
    ID: u2 	 Name: s +1
    ID: u3 	 Name: C +1


In [5]:
for id_, unitdef in enzmldoc.getUnitDict().items():
    
    unit_id = unitdef.getId()
    unit_name = unitdef.getName()
    unit_meta_id = unitdef.getMetaid()
    unit_ontology = unitdef.getOntology()
    baseunits = unitdef.getUnits()
    
    for unit in baseunits:
        
        kind = unit[0]
        exponent = unit[1]
        scale = unit[2]
        multiplier = unit[3]

## Proteins

- Each unit defined in an _EnzymeMLDocument_ is stored as a _Protein_ object in a dictionary
- These can be accessed via _getProtein_ or by iteration.

Attributes:
    - ID: Internal identifier
    - Name: Systematic name of protein
    - Conc(entration): Value of initial concentration
    - Unit: Name of the concentration unit 
    - Sequence: Protein aminoacid sequence
    - Vessel: Name of vessel used in experiment

In [6]:
enzmldoc.printProteins()

>>> Proteins
    ID: p0 	 Name: EnzymeMLase


In [7]:
for id_ in enzmldoc.getProteinDict():
    
    protein = enzmldoc.getProtein(id_)
    
    protein_id = id_
    protein_name = protein.getName()
    protein_conc = protein.getInitConc()
    protein_unit = enzmldoc.getUnitDict()[ protein.getSubstanceUnits() ].getName()
    protein_sequence = protein.getSequence()
    protein_vessel = enzmldoc.getVessel().getName()

## Reactants

- Each reaction defined in an _EnzymeMLDocument_ is stored as a _Reactant_ object in a dictionary
- These can be accessed via _getReactant_ or by iteration.

Attributes:
    - ID: Internal identifier
    - Name: Systematic name of protein
    - Conc(entration): Value of initial concentration
    - Unit: Name of the concentration unit 
    - Sequence: Protein aminoacid sequence
    - Vessel: Name of vessel used in experiment
    - Inchi: Inchi encoded substance structure
    - Smiles: Smiles encoded substance structure

In [8]:
enzmldoc.printReactants()

>>> Reactants
    ID: s0 	 Name: Reactant1


In [9]:
for id_ in enzmldoc.getReactantDict():
    
    reactant = enzmldoc.getReactant(id_)
    
    reactant_id = id_
    reactant_name = reactant.getName()
    reactant_conc = reactant.getInitConc()
    reactant_unit = enzmldoc.getUnitDict()[ reactant.getSubstanceUnits() ].getName()
    reactant_vessel = enzmldoc.getVessel().getName()
    
    # NOT INCLUDED IN DEMO OMEX
    #reactant_inchi = reactant.getInchi()
    #reactant_smiles = reactant.getSmiles()

## Reactions

- Each reaction defined in an _EnzymeMLDocument_ is stored as an _EnzymeReaction_ object in a dictionary
- These can be accessed via _getReaction_ or by iteration. 
- Besides reaction conditions, lists of educts/products/modifiers define which substances participate in the reaction:

        - Reactant/Protein Identifier
        - Stoichiometry
        - Whether or not substance concentrations are constant
        - Replicate data

Attributes:
    - ID: Internal identifier
    - Name: Reaction name
    - Temperature: Value of given temperature
    - Temperature Unit: Unit of given temperature
    - pH: pH value
    - educts/products/modifiers: List of educts tuples (reactant ID, stoichiometry, isConstant, list of replicates)
    
Replicate object

    - Data: Pandas series of time course data
    - Unit: Unit of replicate
    - ID: Unique replicate ID
    - Time unit: Unit of time
    - Type: Data type (e.g. "conc" for concentration)
        

In [10]:
enzmldoc.printReactions()

>>> Reactions
    ID: r0 	 Name: Reaction1


In [13]:
for id_ in enzmldoc.getReactionDict():
    
    reaction = enzmldoc.getReaction(id_, by_id=True)
    
    reaction_id = id_
    reaction_name = reaction.getName()
    reaction_ph = reaction.getPh()
    reaction_temp = reaction.getTemperature()
    reaction_unit = reaction.getTempunit()
    reaction_educts = reaction.getEducts()
    reaction_products = reaction.getProducts()
    reaction_modifiers = reaction.getModifiers()
    
    
    ############## EDUCTS ##############
    
    for reactant_id, stoich, _, replicates, init_concs in reaction_educts:
        
        reactant = enzmldoc.getReactant(reactant_id)  # Return Reactant object
        stoichiometry = stoich
        raw_data = reaction.exportReplicates( reactant_id ) # exports all time course data of said reactant to Pandas
        
        # access all individual replicates
        for replicate in replicates:
            
            replicate_data = replicate.getData() # Pandas Series object 
            replicate_unit = enzmldoc.getUnitDict()[ replicate.getDataUnit() ]
            replicate_id = replicate.getReplica()
            replicate_timeunit = enzmldoc.getUnitDict()[ replicate.getTimeUnit() ]
            replicate_type = replicate.getType()
            replicate_measurement = replicate.getMeasurement()
            replicate_initConc = replicate.getInitConc()
            
            
    ############## PRODUCTS ##############
    
    for reactant_id, stoich, _, replicates, init_concs in reaction_products:
        
        reactant = enzmldoc.getReactant(reactant_id)  # Return Reactant object
        stoichiometry = stoich
        raw_data = reaction.exportReplicates( reactant_id ) # exports all time course data of said reactant to Pandas
        
        # access all individual replicates
        for replicate in replicates:
            
            replicate_data = replicate.getData() # Pandas Series object 
            replicate_unit = enzmldoc.getUnitDict()[ replicate.getDataUnit() ]
            replicate_id = replicate.getReplica()
            replicate_timeunit = enzmldoc.getUnitDict()[ replicate.getTimeUnit() ]
            replicate_type = replicate.getType()
            replicate_measurement = replicate.getMeasurement()
            replicate_initConc = replicate.getInitConc()
       
    
    ############## MODIFIERS ##############
    
    for reactant_id, stoich, _, replicates, init_concs in reaction_modifiers:
        
        if 's' in reactant_id: reactant = enzmldoc.getReactant(reactant_id);  # Return Reactant object
        if 'p' in reactant_id: reactant = enzmldoc.getProtein(reactant_id);  # Return Reactant object
        stoichiometry = stoich
        
        # access all individual replicates
        for replicate in replicates:
                        
            replicate_data = replicate.getData() # Pandas Series object 
            replicate_unit = enzmldoc.getUnitDict()[ replicate.getDataUnit() ]
            replicate_id = replicate.getReplica()
            replicate_timeunit = enzmldoc.getUnitDict()[ replicate.getTimeUnit() ]
            replicate_type = replicate.getType()
            replicate_measurement = replicate.getMeasurement()
            replicate_initConc = replicate.getInitConc()
            