## Biomappings and Epidemiology in ASKEM

The DARPA [Automating Scientific Knowledge Extraction and Modeling (ASKEM)](https://www.darpa.mil/news-events/2021-12-06) program aims to develop technologies to help experts more quickly create, maintain, and analyze models of complex systems such as the spread of disease during a pandemic.



- autoamtically assemble domain knowledge
- deduplicate redundant information in controlled vocabularies
- make mapping predictions and curate them in biomappings

## Ontology Summary

There are hundreds of medium- and high-quality ontologies in the OBO Foundry, and thousands of low-, medium-, and high-quality ontologies in the BioPortal. 

In [42]:
import bioregistry
from tabulate import tabulate
from IPython.display import Markdown

ontologies = {
    "apollosv",
    "idomal",
    "cemo",
    "ido",
    "vo",
    "ovae",
    "oae",
    "cido",
    "covoc",
    "idocovid19",
    "vido",
}

table = tabulate(
    [
        (
            f"[{ontology}](https://bioregistry.io/{ontology})", 
            bioregistry.get_name(ontology).removeprefix("The "), 
        )
        for ontology in sorted(ontologies)
    ], 
    headers=["Prefix", "Name"], 
    tablefmt="github",
)

Markdown(f"""\
The following {len(rows)} ontologies are used in the
ASKEM Epidemiology Knowledge Graph:

{table}
""")

The following 11 ontologies are used in the
ASKEM Epidemiology Knowledge Graph:

| Prefix                                          | Name                                          |
|-------------------------------------------------|-----------------------------------------------|
| [apollosv](https://bioregistry.io/apollosv)     | Apollo Structured Vocabulary                  |
| [cemo](https://bioregistry.io/cemo)             | COVID-19 epidemiology and monitoring ontology |
| [cido](https://bioregistry.io/cido)             | Coronavirus Infectious Disease Ontology       |
| [covoc](https://bioregistry.io/covoc)           | CoVoc Coronavirus Vocabulary                  |
| [ido](https://bioregistry.io/ido)               | Infectious Disease Ontology                   |
| [idocovid19](https://bioregistry.io/idocovid19) | COVID-19 Infectious Disease Ontology          |
| [idomal](https://bioregistry.io/idomal)         | Malaria Ontology                              |
| [oae](https://bioregistry.io/oae)               | Ontology of Adverse Events                    |
| [ovae](https://bioregistry.io/ovae)             | Ontology of Vaccine Adverse Events            |
| [vido](https://bioregistry.io/vido)             | Virus Infectious Disease Ontology             |
| [vo](https://bioregistry.io/vo)                 | Vaccine Ontology                              |


## Mapping Curation Summary

In [43]:
import biomappings

mappings_df = biomappings.load_concat_mappings_df()

filter_df(mappings_df).groupby("status").count()["relation"].to_frame()

Unnamed: 0_level_0,relation
status,Unnamed: 1_level_1
negative,19
positive,157
predicted,163
unsure,2


In [38]:
import pandas as pd

def get_idx(df):
    return (
        (df["source prefix"].isin(ontologies)) 
        | (df["target prefix"].isin(ontologies)) 
    )

def get_in_idx(df):
    return (
        (df["source prefix"].isin(ontologies)) 
        & (df["target prefix"].isin(ontologies)) 
    )

def get_out_idx(df):
    return (
        (df["source prefix"].isin(ontologies)) 
        & ~(df["target prefix"].isin(ontologies)) 
    )

def filter_df(df):
    return df[get_idx(df)]

def view_df(df):
    rv = filter_df(df)
    rv = rv[rv.columns[:7]]
    return rv

def print_summary(df, label):
    print(f"""\
Of the {get_idx(df).sum()} {label} mappings, \
{get_in_idx(df).sum()} are between two of the \
ASKEM Epidemiology ontologies and {get_out_idx(df).sum()} \
are from an ASKEM Epidemiology to an external ontology.
""".rstrip())

In [5]:
positive_mappings_df = biomappings.load_positive_mappings_df()
negative_mappings_df = biomappings.load_negative_mappings_df()
predicted_mappings_df = biomappings.load_predicted_mappings_df()

Unnamed: 0,source prefix,source identifier,source name,relation,target prefix,target identifier,target name,type,source,prediction_type,prediction_source,prediction_confidence,status,confidence
0,agrovoc,0619dd9e,chain harrows,skos:exactMatch,agro,00000137,chain harrow,manually_reviewed,orcid:0000-0002-2627-0696,,,,positive,
1,agrovoc,0a9fbc47,scythes,skos:exactMatch,agro,00000456,scythe,manually_reviewed,orcid:0000-0002-2627-0696,,,,positive,
2,agrovoc,1012,border irrigation,skos:exactMatch,agro,00000066,border irrigation process,manually_reviewed,orcid:0000-0002-2627-0696,,,,positive,
3,agrovoc,10457,draught animals,skos:exactMatch,agro,00000116,draft animal,manually_reviewed,orcid:0000-0002-2627-0696,,,,positive,
4,agrovoc,1101,broadcasting,skos:exactMatch,agro,00000040,broadcast application method,manually_reviewed,orcid:0000-0002-2627-0696,,,,positive,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40686,wikipathways,WP880,MAPK signaling pathway,RO:HOM0000017,wikipathways,WP998,MAPK signaling pathway,lexical,https://github.com/biomappings/biomappings/blo...,,,,predicted,0.95
40687,wikipathways,WP899,Wnt signaling pathway,RO:HOM0000017,wikipathways,WP980,Wnt signaling pathway,lexical,https://github.com/biomappings/biomappings/blo...,,,,predicted,0.95
40688,wikipathways,WP927,Hepatocyte growth factor receptor signaling,RO:HOM0000017,wikipathways,WP94,Hepatocyte growth factor receptor signaling,lexical,https://github.com/biomappings/biomappings/blo...,,,,predicted,0.95
40689,wikipathways,WP97,Wnt signaling pathway,RO:HOM0000017,wikipathways,WP980,Wnt signaling pathway,lexical,https://github.com/biomappings/biomappings/blo...,,,,predicted,0.95


In [28]:
view_df(positive_mappings_df)

Unnamed: 0,source prefix,source identifier,source name,relation,target prefix,target identifier,target name
142,apollosv,00000097,ecosystem,skos:exactMatch,envo,ENVO:01001110,ecosystem
143,apollosv,00000114,infection,skos:exactMatch,ido,0000586,infection
144,apollosv,00000142,vaccination,skos:exactMatch,idomal,0001040,vaccination
145,apollosv,00000142,vaccination,skos:exactMatch,vo,0000002,vaccination
146,apollosv,00000154,exposed population,skos:exactMatch,ncit,C71551,Exposed Population
...,...,...,...,...,...,...,...
8802,vo,0010950,SecA2,skos:exactMatch,ogg,3000885594,secA2
8803,vo,0011210,SERA-5,skos:exactMatch,idomal,0001109,SERA-5
8804,vo,0011241,Hsp90,skos:exactMatch,pr,PR:000025350,HSPC protein
8805,vo,0012381,LSA-3,skos:exactMatch,idomal,0001093,LSA-3


In [41]:
d = view_df(positive_mappings_df)
d[d["source name"] != d["target name"]]

Unnamed: 0,source prefix,source identifier,source name,relation,target prefix,target identifier,target name
146,apollosv,00000154,exposed population,skos:exactMatch,ncit,C71551,Exposed Population
157,apollosv,00000239,infectious disease,skos:exactMatch,doid,DOID:0050117,disease by infectious agent
159,apollosv,00000311,Breteau Index,skos:exactMatch,vsmo,0000070,breteau index
165,apollosv,00000429,date,skos:exactMatch,dc,date,Date
859,cemo,case_fatality_rate,case fatality rate,skos:exactMatch,ncit,C173779,Case Fatality Rate
860,cemo,infection_fatality_rate,infection fatality rate,skos:exactMatch,ncit,C173780,Infection Fatality Rate
2691,cido,0000390,BALB/c mouse,skos:exactMatch,vo,0000051,Balb/c mouse
2692,cido,0001004,Bevacizumab,skos:exactMatch,dron,00014332,bevacizumab
2693,cido,0001007,EIDD-2801,skos:exactMatch,chebi,CHEBI:180653,molnupiravir
2694,cido,0001037,Fostamatinib,skos:exactMatch,dron,00819888,fostamatinib


In [39]:
print_summary(positive_mappings_df, "positive")

Of the 157 positive mappings, 58 are between two of the ASKEM Epidemiology ontologies and 99 are from an ASKEM Epidemiology to an external ontology.


In [None]:
positive_mappings_df

In [7]:
view_df(negative_mappings_df)

Unnamed: 0,source prefix,source identifier,source name,relation,target prefix,target identifier,target name
20,apollosv,94,community,skos:exactMatch,ncit,C16453,Community
21,apollosv,340,agonism,skos:exactMatch,nbo,0000015,aggressive behavior
22,apollosv,340,agonism,skos:exactMatch,nbo,0000121,agonistic behavior
236,idomal,1123,RAP-1,skos:exactMatch,vo,0011026,RAP-1
1140,vo,10896,Ssb,skos:exactMatch,uberon,UBERON:0004199,S-shaped body
1141,vo,10988,IroN,skos:exactMatch,chebi,CHEBI:18248,iron atom
1142,vo,10997,IroN,skos:exactMatch,chebi,CHEBI:18248,iron atom
1143,vo,11021,CP,skos:exactMatch,chebi,CHEBI:3380,captopril
1144,vo,11021,CP,skos:exactMatch,hp,HP:0100021,Cerebral palsy
1145,vo,11026,RAP-1,skos:exactMatch,idomal,0001123,RAP-1


In [8]:
view_df(predicted_mappings_df)

Unnamed: 0,source prefix,source identifier,source name,relation,target prefix,target identifier,target name
0,apollosv,00000172,susceptibility,skos:exactMatch,pato,PATO:0001043,susceptibility toward
1,apollosv,00000238,host,skos:exactMatch,ido,0000531,host
2,apollosv,00000243,duration,skos:exactMatch,pato,PATO:0001309,duration
3,apollosv,00000317,incubation period,skos:exactMatch,vsmo,0000499,intrinsic incubation period
4,apollosv,00000357,predation,skos:exactMatch,idomal,0002193,predation
...,...,...,...,...,...,...,...
39001,vo,0011171,HN,skos:exactMatch,ncit,C158173,Hani Chinese
39002,vo,0011220,MSP3,skos:exactMatch,doid,DOID:0111386,inclusion body myopathy with early-onset Paget...
39003,vo,0011220,MSP3,skos:exactMatch,idomal,0001145,MSP-3
39004,vo,0011221,MSP8,skos:exactMatch,idomal,0001150,MSP-8
