# KAMI Tutorial



The bio-curation framework KAMI aims to decouple the process of knowledge curation from model building and allows building large signalling models from knowledge. It provides tools for semi-automated aggregation of fragmented knowledge on individual protein-protein interactions (PPIs) into knowledge corpora. These corpora can be used for automatic generation of dynamical rule-based models of cellular signalling in different contexts. 

KAMI provides a de-contextualize knowledge represenation of PPIs taking part in cellular signalling. Such de-contextualization consists in seeking to represent not the actual interactions occurring between different concrete molecules, but rather the _minimal requirements_ for various interaction mechanisms to be realized. Such minimal requirements vary from purely structural, such as presence or absence of specific protein domains or key residues, to phenomenological, such as activation of proteins or their functional sites.

The de-contextualization can be achieved by, first, abstracting from the notion of a protein to the notion of a _protoform_ as the agent of a PPI. A protoform does not represent a concrete molecule, but a set of all product molecules that can be realized from a particular gene (as the result of translation and various PTMs). Therefore, an agent of interaction in KAMI represents _constraints_ on a neighbourhood in the sequence space of a gene (e.g. splice variants and mutants) together with all the combinations of PTMs (e.g. phosphorylation of residues) and phenomenological states (e.g. activity). This implies that KAMI represents knowledge on _potential_ individual PPIs that can be realized (or not) in different cellular contexts and allows KAMI to reuse the same knowledge corpus for generation of models for these different context.

Read more about the KAMI framework in this [article](https://www.ncbi.nlm.nih.gov/pubmed/30908261).

In [1]:
import json

from kami import KamiCorpus
from kami.data_structures.entities import (Protoform, Region, Site, Residue,
                                           State, Protein, RegionActor, SiteActor)
from kami.data_structures.definitions import (Definition, Product)
from kami.data_structures.interactions import *

In [2]:
from kami.data_structures import entities

## Creating a corpus

KAMI distinguishes two types of knowledge bodies: a _knowledge corpus_ and a model. Corpora contain de-contextualized knowledge: agents of interactions are protoforms and the regions, residues and states associated to protoforms define feasible set of its variants. Interactions in a corpus represent potential interactions and the necessary conditions for them to occur.

A KAMI corpus is a hierarchy of graphs (see [this article](https://arxiv.org/abs/2002.01766) for more details) that contains the following components:

- The _meta-model_ graph defines kinds of entities that can exist in a system.
- The _action graph_ represents a global roadmap containing the 'anatomy' of protoforms, their states, PTMs and all potential interactions present in the knowledge corpus. Every node of the action graph is typed by a node in the meta-model through a graph homomorphism.
- The collection of _nuggets_ encodes rules for potential PPIs, it specifies the necessary conditions for interactions between different protoforms. All the nugget graphs are mapped to the action graph through the collection of homomorphisms. These homomorphisms identify entities and actions represented in different nuggets with entities and interactions in the action graph.
- The _semantic action graph_, represents a roadmap containing background knowledge on kinds of conserved protein domains and their generic interaction mechanisms. As in the case of the action graph, every node of the semantic action graph is typed by the meta-model through a homomorphism. The relation between the action graph and the semantic action graph associates entities and actions present in the action graph with their semantics.
- The collection of _semantic nuggets_ encodes individual semantic PPI mechanisms of conserved protein regions. All the entities and actions from different semantic nuggets are instances of such in the semantic action graph. The collection of relations associates entities and actions in nuggets to their semantics.

The following listing shows how a KAMI corpus can be created. KAMI requires corpora to be assigned with an identifier (e.g. 'EGFR_signalling' in the listing below). 

In [3]:
# Create an empty KAMI corpus based on in-memory graphs
corpus = KamiCorpus("EGFR_signalling", backend="networkx")

## Adding protein-protein interactions to the corpus

KAMI provides two modules for expressing PPIs, their agents and components: `kami.data_structures.entities` and 
`kami.data_structures.interactions`.

Consider the Python listing below, it illustrates how the entity and interaction data structures provided by the KAMI library can be used for manual input of PPIs. In addition, it shows how such objects can be serialized and de-serialized to/from the JSON format. The interaction object created in the listing corresponds to the statement:
    _'A protein product of EGFR can phosphorylate residue Y1092 of another EGFR molecule through its active kinase domain, when the two molecules are bound'_.

In [4]:
# Create a protoform
egfr = Protoform("P00533")

# Create a region actor
egfr_kinase = RegionActor(
    protoform=egfr,
    region=Region(name="Protein kinase", start=712, end=979,
                  states=[State("activity", True)]))

# Create a ligand modification interaction
interaction = LigandModification(
    enzyme=egfr_kinase, substrate=egfr,
    target= Residue("Y", 1092, state=State("phosphorylation", False)), value=True,
    rate=1, desc="Phosphorylation of EGFR homodimer")

In [5]:
# Convert interaction to JSON
int_json = interaction.to_json()
print(json.dumps(int_json, indent=" "))

# Convert JSON back to interaction
copy_interaction = Interaction.from_json(int_json)

{
 "type": "LigandModification",
 "enzyme": {
  "data": {
   "protoform": {
    "uniprotid": "P00533",
    "regions": [],
    "sites": [],
    "residues": [],
    "states": [],
    "bound_to": [],
    "unbound_from": []
   },
   "region": {
    "name": "Protein kinase",
    "start": 712,
    "end": 979,
    "sites": [],
    "residues": [],
    "states": [
     {
      "name": "activity",
      "test": true
     }
    ],
    "bound_to": [],
    "unbound_from": []
   }
  },
  "type": "RegionActor"
 },
 "substrate": {
  "data": {
   "uniprotid": "P00533",
   "regions": [],
   "sites": [],
   "residues": [],
   "states": [],
   "bound_to": [],
   "unbound_from": []
  },
  "type": "Protoform"
 },
 "target": {
  "type": "Residue",
  "data": {
   "aa": [
    "Y"
   ],
   "test": [
    true
   ],
   "loc": 1092,
   "state": {
    "name": "phosphorylation",
    "test": false
   }
  }
 },
 "value": true,
 "enzyme_bnd_subactor": "protoform",
 "substrate_bnd_subactor": "protoform",
 "rate": 1,
 "d

An interaction object can be added to the corpus using the `add_interaction` method. 

Provided an individual PPI expressed with an interaction object, KAMI aggregates new knowledge into the underlying corpus in a context-dependent fashion performing the following sequence of steps:
- a nugget graph is generated;
- entities and actions already present in the action graph are identified in the action graph; 
- the nugget is added to the corpus and new knowledge is propagated to the action graph;
- bookkeeping updates, such as anatomization of new genes, reconnection of spatially nested components, are performed; 
- updates specific to semantics of the provided PPI are performed. 

In [6]:
# Add interaction to the corpus
new_nugget_id = corpus.add_interaction(interaction)

KAMI provides various utils for accessing the components of the corpus, e.g. a nugget graph object, the action graph, identification of entities and actions in the nugget by the action graph, typing of the action graph by the meta-model, etc. In addition, KAMI implements a set of utils for manual addition of new entities and actions to the action graph (independent from the aggregation process).

In [7]:
# Access the newly created nugget graph
nugget = corpus.get_nugget(new_nugget_id)
print("Description: ", corpus.get_nugget_desc(new_nugget_id))
print("Nodes of the nugget: ")
for n in nugget.nodes():
    print("\t", n)

# Access the action graph
ag = corpus.action_graph
print("Nodes of the action graph: ")
for n in ag.nodes():
    print("\t", n)

# Get the identification of nugget nodes in the action graph 
print("\nIdentification of nugget nodes:", corpus.get_nugget_typing(new_nugget_id))

# Get typing of the action graph by the meta-model
print("\nTyping of action graph by the meta-model: ", corpus.get_action_graph_typing())

# Get all the protoforms in the corpus
print("\nAll the protoform nodes in the corpus:")
for p in corpus.protoforms():
    print("\t", p, corpus.action_graph.get_node(p))

# Find a protoform node by the UniProt AC of its gene
egfr_protoform_node_id = corpus.get_protoform_by_uniprot("P00533")

Description:  Phosphorylation of EGFR homodimer
Nodes of the nugget: 
	 P00533
	 P00533_region_Protein kinase_712_979
	 P00533_region_Protein kinase_712_979_activity
	 P00533_1
	 mod
	 P00533_1_Y1092
	 P00533_1_Y1092_phosphorylation
	 is_bnd
Nodes of the action graph: 
	 P00533_region_Protein kinase_712_979_activity
	 is_bnd
	 P00533_1_Y1092
	 P00533_1_Y1092_phosphorylation
	 mod
	 P00533_region_Protein kinase_712_979
	 P00533_P00533_1
	 region_1
	 region_2
	 region_4
	 region_5

Identification of nugget nodes: {'P00533': 'P00533_P00533_1', 'P00533_region_Protein kinase_712_979': 'P00533_region_Protein kinase_712_979', 'P00533_region_Protein kinase_712_979_activity': 'P00533_region_Protein kinase_712_979_activity', 'P00533_1': 'P00533_P00533_1', 'mod': 'mod', 'P00533_1_Y1092': 'P00533_1_Y1092', 'P00533_1_Y1092_phosphorylation': 'P00533_1_Y1092_phosphorylation', 'is_bnd': 'is_bnd'}

Typing of action graph by the meta-model:  {'P00533_region_Protein kinase_712_979_activity': 'state', 'is

In [8]:
# Manually add a new protoform and its site to the corpus
new_protoform_node = corpus.add_protoform(Protoform("P62993"))
corpus.add_site(Site("New site"), new_protoform_node)
print(corpus.get_attached_sites(new_protoform_node))

print("\nUpdated protoform nodes in the corpus:")
for p in corpus.protoforms():
    print("\t", p, corpus.action_graph.get_node(p))

['P62993_site_New site']

Updated protoform nodes in the corpus:
	 P00533_P00533_1 {'uniprotid': {'P00533'}, 'hgnc_symbol': {'EGFR'}, 'synonyms': {'ERBB', 'ERBB1', 'HER1'}}
	 P62993 {'uniprotid': {'P62993'}, 'hgnc_symbol': {'GRB2'}, 'synonyms': {'ASH'}}


The following listing adds to the corpus interaction corresponding to the following statements:
    
- _'A protein product of SHC1 can bind to the SH2 domain of a GRB2 protein through its site pY having the residue Y317 phosphorylated.'_
- _'A protein product of EGFR can bind to the SH2 domain of a GRB2 protein through its site pY having the residue Y1092 phosphorylated. This interaction happens when the SH2 domain of GRB2 has the key residue S90, but not D90.'_

In [9]:
grb2 = Protoform("P62993")
grb2_sh2 = RegionActor(
    protoform=grb2,
    region=Region(name="SH2"))

shc1 = Protoform("P29353")
shc1_pY = SiteActor(
    protoform=shc1,
    site=Site(
        name="pY",
        residues=[Residue("Y", 317, State("phosphorylation", True))]))
interaction1 = Binding(grb2_sh2, shc1_pY)

grb2_sh2_with_residues = RegionActor(
    protoform=grb2,
    region=Region(
        name="SH2",
        residues=[
            Residue("S", 90, test=True),
            Residue("D", 90, test=False)]))

egfr_pY = SiteActor(
    protoform=egfr,
    site=Site(
        name="pY",
        residues=[Residue("Y", 1092, State("phosphorylation", True))]))

interaction2 = Binding(egfr_pY, grb2_sh2_with_residues)

corpus.add_interactions([interaction1, interaction2])

['EGFR_signalling_nugget_2', 'EGFR_signalling_nugget_3']

## Importing interactions from BioPAX and INDRA Statements

As we have previously mentioned, KAMI provides an importer for PPIs represented using the BioPAX format. The following listing illustrates how KAMI interaction objects can be created from a BioPAX model stored in a `.owl` file.

In [10]:
from kami.importers.biopax import BioPaxImporter

# Convert BioPax model into KAMI interactions
bp_importer = BioPaxImporter()
biopax_interactions = bp_importer.import_model("data/PathwayCommons11.pid.BIOPAX.owl")

# Add 10 interactions to the corpus
corpus.add_interactions(biopax_interactions[:10])

-------------------------------------------------------------------------------
Deprecated: convertStrings was not specified when starting the JVM. The default
behavior in JPype will be False starting in JPype 0.8. The recommended setting
for new code is convertStrings=False.  The legacy value of True was assumed for
please file a ticket with the developer.
-------------------------------------------------------------------------------

  """)
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))
  protei

  protein_reference.getName()))
  states)
  protein_reference.getName()))
  protein_reference.getName()))
  protein_reference.getName()))


['EGFR_signalling_nugget_4',
 'EGFR_signalling_nugget_5',
 'EGFR_signalling_nugget_6',
 'EGFR_signalling_nugget_7',
 'EGFR_signalling_nugget_8',
 'EGFR_signalling_nugget_9',
 'EGFR_signalling_nugget_10',
 'EGFR_signalling_nugget_11',
 'EGFR_signalling_nugget_12',
 'EGFR_signalling_nugget_13']

Moreover, KAMI allows to convert INDRA statement objects into native entity and interaction objects. In the following listing a text containing a mechanistic description of PPIs is proccessed using INDRA's TRIPS processor into statement objects. These objects are further converted into interactions and added to the corpus.

In [None]:
from indra.sources import trips
from kami.importers.indra import IndraImporter


text = (
    "MAP2K1 phosphorylates MAPK3 at Thr-202 and Tyr-204;"
    "ABL1 phosphorylates PLCG1 at Y394.")

# Proccess text using INDRA's TPIS processor
trips_processor = trips.process_text(text)
# Get INDRA statements
statements = trips_processor.statements

# Convert statements into KAMI interactions
indra_importer = IndraImporter()
indra_interactions = indra_importer.import_statements(statements)

# Add interactions to the corpus
corpus.add_interactions(indra_interactions) 

## Creating protein definitions and instantiating model

KAMI allows the curator to accommodate knowledge about different variants of proteins (for instance, slice variants or mutants)---_protein definitions_. Such protein definitions can be used to specify the 'anatomy' of variants, for example the loss of functional sites or amino acid replacements. Protein definitions are used in the process of instantiation of _concrete signalling models_ from a knowledge corpus. As the result of such instantiation, some of the potential PPIs present in the corpus are not realized.

KAMI provides the `Definition` data structure for creation of protein definitions. As the input, the constructor of `Definition` takes a protoform object and a list of `Product` objects. The latter objects define which components are removed from the protoform and which amino acids are set to its key residues in particular protein products. The following listing creates the protein definition for GRB2. The created protein definition can be further used to instantiate a concrete signalling model.

In [None]:
# Create a protein definition for GRB2
protoform = Protoform(
    "P62993",
    regions=[Region(
        name="SH2",
        residues=[
            Residue("S", 90, test=True),
            Residue("D", 90, test=False)])])

ashl = Product(name="Ash-L", residues=[Residue("S", 90)])
s90d = Product(name="S90D", residues=[Residue("D", 90)])
grb3 = Product(name="Grb3", removed_components={"regions": [Region("SH2")]})

grb2_definition = Definition(protoform, products=[ashl, s90d, grb3])

The mechanism of knowledge instantiation allows the curator to _reuse_ a knowledge corpora in different cellular contexts. Such contexts can be specified by a set of protein definitions. 

In [None]:
# Instantiate a model for the corpus
grb_variants_model = corpus.instantiate("EGFR_signalling_GRB2", [grb2_definition])

## Generating Kappa

Instantiated KAMI models can be used to generate scripts written in the rule-based modelling language _Kappa_ (compatible with the version 4 of the Kappa language and its simulator KaSim4). Given an instantiated model, KAMI generates Kappa scripts containing agent signatures, interaction rules and initial conditions. To be used for stochastic simulations, such scripts should be further augmented with observables that specify the patterns of interest (particular agents in some combination of states or bonds) whose quantitative dynamics should be tracked by the Kappa simulator (see kappalanguage.org).


The module `kami.exporters.kappa` provides a set of utilities for generation of executable Kappa scripts from both instantiated models and knowledge corpora (provided protein definitions for the protoforms present in the corpora). Consider the listing below, it defines initial conditions for protein products of EGFR, GRB2 and SHC1. Such conditions specify the number of molecules for different states of the corresponding proteins in the initial mixture. For example, the listing defines the following initial concentrations of the EGFR products in the mixture:

- 150 molecules of the canonical EGFR protein (no PTMs, bounds or activity);
- 75 molecules of the EGFR protein with the active kinase domain;
- 30 molecules of the EGFR protein with the phosphorylated Y1092;
- 30 molecules of the EGFR protein with the phosphorylated Y1092 and bound to the SH2 domain of Ash-L through its pY site;
- 30 instances of the EGFR protein dimer (EGFR bound to another EGFR).

In [None]:
from kami.exporters.kappa import KappaInitialCondition


# Initial condition for EGFR
egfr_initial = KappaInitialCondition(
    canonical_protein=Protein(Protoform("P00533")),
    canonical_count=150,
    stateful_components=[
        (egfr_kinase, 75),
        (Residue("Y", 1092, state=State("phosphorylation", True)), 30),
        (Site(
            name="pY",
            residues=[Residue("Y", 1092,
                              state=State("phosphorylation", True))],
            bound_to=[
                RegionActor(
                    protoform=grb2, region=Region(name="SH2"),
                    variant_name="Ash-L")
            ]), 30)
    ],
    bonds=[
        (Protein(Protoform("P00533")), 30),
    ])

# Initial condition for Ash-L
ashl_initial = KappaInitialCondition(
    canonical_protein=Protein(Protoform("P62993"), "Ash-L"),
    canonical_count=200,
    stateful_components=[
        (Region(name="SH2", bound_to=[shc1_pY]), 40)])

# Initial condition for S90D
s90d_initial = KappaInitialCondition(
    canonical_protein=Protein(Protoform("P62993"), "S90D"),
    canonical_count=20,
    stateful_components=[
        (Region(name="SH2", bound_to=[egfr_pY]), 10)])

# Initial condition for Grb3
grb3_initial = KappaInitialCondition(
    canonical_protein=Protein(Protoform("P62993"), "Grb3"),
    canonical_count=70)

# Initial condition for SHC1
shc1_initial = KappaInitialCondition(
    canonical_protein=Protein(Protoform("P29353")),
    canonical_count=100,
    stateful_components=[
        (Residue("Y", 317, state=State("phosphorylation", True)), 30)])
        
initial_concentrations = [
    egfr_initial,
    ashl_initial,
    s90d_initial,
    grb3_initial,
    shc1_initial
]

The two listings below illustrate how the `ModelKappaGenerator` and `CorpusKappaGenerator` classes can be used to generate Kappa scripts from the previously defined model and corpus respectively. The generation adds initial conditions corresponding to the concentrations defined in the previous listing. The `default_concentation` argument is used to assign default concentration for canonical agents (with no PTMs and bounds) that are not mentioned in the `initial_concentrations` parameter.

In [None]:
from kami.exporters.kappa import ModelKappaGenerator

# Create a Kappa generator from a model
generator = ModelKappaGenerator(
    grb_variants_model)
# Generate Kappa with default agent
# concentration 75 molecules per agent
kappa_str = generator.generate(
    initial_concentrations,
    default_concentration=75)
print(kappa_str)

In [None]:
from kami.exporters.kappa import CorpusKappaGenerator

# Create a Kappa generator from a corpus
generator = CorpusKappaGenerator(corpus, [grb2_definition])
# Generate Kappa with default agent
# concentration 75 molecules per agent
kappa_str = generator.generate(
    initial_concentrations,
    default_concentration=75)
print(kappa_str)