# December 2022 Demo - TA1

- **Harvard** Domain Knowledge Graph
- **UWisc+** xDD
- **SKEMA** Code2FN, Text Reading and METAL, SKEMA Server, MORAE, ISA
- **MIT**
- **SKEMA (SIFT)** FUNMAN
- Products

# SKEMA Demo

## Preamble

In [8]:
### Code2FN
import os
from pprint import pprint
SKEMA_GOOGLE_DRIVE_ROOT = "/home/five/covid/"

### Text Reading and METAL
from automates.utils.fold import dictionary_to_gromet_json, del_nulls
from skema.text_reading.mention_linking.gromet_linker import align_and_link

# File paths (replace the data directory with your local version)
# data_directory = "/media/evo870/github/skema/skema/text_reading/mention_linking/data"
data_directory = "/home/five/other"
embedding_directory = "/home/five/other"

gromet_path = os.path.join(data_directory, "gromet", "test-auto.json")
comments_path = os.path.join(data_directory, "comments", "test-comments.json")
extractions_path = os.path.join(data_directory, "extractions", "test-extractions.json")
embeddings_path = os.path.join(embedding_directory, "embeddings.kv")

Running the SKEMA server
- `cd skema/skema/skema-rs`
- `docker-compose -f docker-compose.memgraph.yml up --build -d`
- `cargo run --bin skema_service`

## Code2FN

Extracting multi-module systems
```
penn_chime/
├── __init__.py
├── cli.py
├── constants.py
└── model
    ├── __init__.py
    ├── parameters.py
    ├── sir.py
    └── validators
        ├── __init__.py
        ├── base.py
        └── validators.py
```

System definition
```
"cli.py"
"constants.py"
"model/parameters.py"
"model/sir.py"
"model/validators/base.py"
"model/validators/validators.py"
```

In [19]:
from automates.utils.script_functions import process_file_system
from automates.program_analysis.JSON2GroMEt.json2gromet import json_to_gromet

CHIME_PATH=os.path.join(SKEMA_GOOGLE_DRIVE_ROOT, "compt-model-python")
CHIME_SYSTEM_NAME="test"
SYSTEM_ROOT_PATH=os.path.join(CHIME_PATH, CHIME_SYSTEM_NAME)
SYSTEM_FILES_PATH=os.path.join(CHIME_PATH, "system_filepaths.txt")

# process_file_system takes in the three parameters from the previous cell, and 
# generates a single GroMEt JSON file containing generated GroMEt for the whole system
process_file_system(CHIME_SYSTEM_NAME, SYSTEM_ROOT_PATH, SYSTEM_FILES_PATH)
# chime_module_collection = json_to_gromet('penn_chime--Gromet-FN-auto.json')

/home/five/covid/compt-model-python/test/main.py
found numpy
found scipy.integrate
found matplotlib.pyplot
Calling IdCollapsePass------------------------

Calling ContainerScopePass-------------------

Calling VariableVersionPass-------------------

Calling GrfnVarCreationPass-------------------

Calling GrfnAssignmentPass-------------------

Calling LambdaExpressionPass-------------------

Calling ToGrometPass-----------------------


In [24]:
import json # had to add this manually :/ ??
with open('test--Gromet-FN-auto.json', 'r') as f:
    chime_module_collection = json.load(f)
pprint(chime_module_collection)

{'executables': [],
 'module_index': ['test.main\n'],
 'modules': [{'attributes': [{'type': 'FN',
                              'value': {'b': [{'function_type': 'FUNCTION',
                                               'name': 'open'}]}},
                             {'type': 'IMPORT',
                              'value': {'name': 'json.load',
                                        'src_language': 'Python',
                                        'type': 'OTHER',
                                        'version': '3.8'}},
                             {'type': 'IMPORT',
                              'value': {'name': 'json.load',
                                        'src_language': 'Python',
                                        'type': 'OTHER',
                                        'version': '3.8'}},
                             {'type': 'FN',
                              'value': {'b': [{'function_type': 'EXPRESSION'}],
                                        'bf': [{'fu

## Text Reading and Metadata Linking (METAL)

![link_graph](figs/link_graph.png)

In [25]:
linked_gromet = align_and_link(gromet_path, comments_path, extractions_path, embeddings_path)
# Save linked gromet
with open("linked_gromet.json", 'w') as f:
    f.write(dictionary_to_gromet_json(del_nulls(linked_gromet.to_dict())))

FileNotFoundError: [Errno 2] No such file or directory: '/home/five/other/gromet/test-auto.json'

In [None]:
# Display linked gromet file
linked_gromet

Example of `doubling_time` units metadata:

```
{"metadata_type": "text_units",
 "text_extraction": {
     "document_reference_uid": "7596863165735053645",
     "page": 6,
     "block": 9,
     "char_begin": 261,
     "char_end": 287
 },
 "variable_identifier": "doubling time",
 "unit_type": "days",
 "grounding": [
     {"argument_name":"variable",
      "id":"pato:0000165",
      "description":"time",
      "score":0.8254868},
 ],
 "provenance": {
     "method": "embedding_similarity_1.0",
     "timestamp": "2022-12-08 07:27:40.876704"
 }
}
```

## SKEMA Server and FN Graph Service

Swagger docs for the SKEMA server REST API: http://localhost:8080/api-docs/

In [113]:
from IPython.display import IFrame
IFrame('http://localhost:8080/api-docs/', width=700, height=350)

In [31]:
import json
from requests import get, put, post, delete
#from graphviz import Source
BASE_URL="http://localhost:8080"

Programmatically verify the service is running:

In [32]:
get(f"{BASE_URL}/ping").text

'The SKEMA Rust web services are running.'

We can view the **memgraph web interace**, in separate brower window, open: `http://localhost:3000`

Load the GroMEt JSON file `CHIME_SVIIvR_core--Gromet-FN-auto.json` into the database. The operation returns the ID of the model that was just inserted - we will use this ID to query the model later.

In [125]:
from requests import post
with open('test--Gromet-FN-auto.json') as f:
    r = post(f"{BASE_URL}/models", json=json.loads(f.read()))
    if r.status_code == 200:
        MODEL_ID = r.json()
        print(f"Model ID: {MODEL_ID}")
    else:
        print(json.loads(r.request.body))

{'schema': 'FN', 'schema_version': '0.1.5', 'name': 'main', 'fn': {'b': [{'function_type': 'MODULE', 'name': 'module', 'metadata': 2}], 'bf': [{'function_type': 'LITERAL', 'value': {'value_type': 'List', 'value': './population.json'}, 'metadata': 3}, {'function_type': 'LITERAL', 'value': {'value_type': 'List', 'value': 'rb'}, 'metadata': 4}, {'function_type': 'FUNCTION', 'contents': 1, 'name': 'module.open_0', 'metadata': 5}, {'function_type': 'FUNCTION', 'contents': 2, 'metadata': 7}, {'function_type': 'LITERAL', 'value': {'value_type': 'List', 'value': './params.json'}, 'metadata': 9}, {'function_type': 'LITERAL', 'value': {'value_type': 'List', 'value': 'rb'}, 'metadata': 10}, {'function_type': 'FUNCTION', 'contents': 1, 'name': 'module.open_1', 'metadata': 11}, {'function_type': 'FUNCTION', 'contents': 3, 'metadata': 13}, {'function_type': 'EXPRESSION', 'contents': 4, 'metadata': 16}, {'function_type': 'FUNCTION', 'contents': 5, 'name': 'module.sum_0', 'metadata': 17}, {'function_t

We then get the IDs of all the models in the database to double check if our model ID is among them.

In [None]:
get(f"{BASE_URL}/models").json()

Run a query to collect the names of all of the _named_ `opi`s (Output Ports) in the model.

In [None]:
get(f"{BASE_URL}/models/{MODEL_ID}/named_opis").json()

The API endpoint for **deleting** a model.

In [None]:
delete(f"{BASE_URL}/models/{MODEL_ID}").text

Get the IDs of all the models in the database again to check that the model has been deleted.

In [None]:
get(f"{BASE_URL}/models").json()

## Model Role Analysis and Extraction (MORAE) progress

Developing Cypher graph queries to supplement code role pattern definitions in conjunction the Data Science Ontology

<div>
<img src="figs/fn_graph_core_dynamics_legend.png" width="500"/>
</div>

## Incremental Structure Analysis (ISA) progress

**A quick review**: We proposed using seeded graph matching (SGM) to achieve incremental structural alignment (ISA) of equations in documents and their implementation in source code. At a high level, the procedure is as follows:

1. Create a graph representation of the equation and the code.
2. Construct the adjacency matrices corresponding to the above graph representations. 
3. Apply the SGM algorithm with the two adjacency matrices as inputs.

Constructing graph representations for equations and code that are suitable for the alignment procedure is a nontrivial task.

Consider the S update equation in the CHIME SIR model.

In [None]:
from IPython.display import Image
IMAGE_PATH = 'sir.png'
Image(filename=IMAGE_PATH)

The presentation MathML representation of the above equation is shown below.

In [None]:
content = \
"""<math 
    xmlns="http://www.w3.org/1998/Math/MathML" 
    display="block" 
    title="S_{t+1} = S_t - \beta S_t I_t "
>
  <mrow>
    <msub>
      <mrow>
        <mi>S</mi>
      </mrow>
      <mrow>
        <mi>t</mi>
        <mo>+</mo>
        <mn>1</mn>
      </mrow>
    </msub>
    <mo>=</mo>
    <msub>
      <mrow>
        <mi>S</mi>
      </mrow>
      <mrow>
        <mi>t</mi>
      </mrow>
    </msub>
    <mo>-</mo>
    <mi>β</mi>
    <msub>
      <mrow>
        <mi>S</mi>
      </mrow>
      <mrow>
        <mi>t</mi>
      </mrow>
    </msub>
    <msub>
      <mrow>
        <mi>I</mi>
      </mrow>
      <mrow>
        <mi>t</mi>
      </mrow>
    </msub>
  </mrow>
</math>"""

We can visualize the the above MathML document using our ast-graph REST API, which returns a DOT representation of the MathML AST for visualization.

In [None]:
Source(
    put('http://localhost:8080/mathml/ast-graph',
        data=content.encode('utf-8')).text
)

However, the presentation MathML representation is only concerned with the *presentation* of the equation, rather than the *content*. In order to work with the semantic content of the equation, we must transform the presentation MathML representation into a semantically meaningful representation, using a set of heuristic rules. We call this representation the 'math expression graph' (MEG). We may come up with a better name for it later...

In the cell below, we send a MathML document to the `/mathml/math-exp-graph` which automatically constructs the MEG and returns a DOT representation of it for visualization.

The generated representation (shown below) encodes the semantic content of the MathML equation in a compact fashion, that can then be represented in an adjacency graph for graph alignment.

In [None]:
Source(
    put(
        'http://localhost:8080/mathml/math-exp-graph',
        data=content.encode('utf-8')).text
)

## BiLayer Extraction

In [None]:
with open('CHIME_SIR_dynamics_BiLayer.json', 'r') as f:
    chime_sir_bl = json.load(f)
pprint(chime_sir_bl)

In [None]:
with open('CHIME_SIR_dynamics_BiLayer-metadata.json', 'r') as f:
    chime_sir_bl = json.load(f)
pprint(chime_sir_bl)

# Products

- [xDD] URL extractions from documents. 
- [xDD] Co-citation graph of documents with URLs
- [SKEMA] Function Networks with linked metadata
- [SKEMA] **BiLayers** with linked metadata
- [MIT] Models with machine-readable annotations. Connections to:
  - Ontology elements
  - Descriptive formulas
  - Paper passages
  - Structured data elements
- [SKEMA] Equation MathML
- [SKEMA-FUNMAN] Feasible Model Parameters with quality metrics
