# BTE metakg visualization

The goal of this notebook is to visualize the metaKG that BTE uses.  Similar to the [subway diagram](https://raw.githubusercontent.com/biothings/BioThings_Explorer_TRAPI/main/diagrams/smartapi_metagraph.png) we've used before, but updated to the size and scale of the current metakg.

Optimizations
* remove less-commonly-used node types from subject/object
* only count in one direction (`A-treats-B` gets merged with `B-treated_by-A`)

In [1]:
import biothings_client
import json5
import networkx as nx
import pandas as pd
import re
import requests

## Read in SmartAPI data

### Option 1 -- Read in the Smart API ndjson file

In [2]:
# older input file from Chunlei
#df = pd.read_json('data/smartapi_metakg_20230413.ndjson.gz', lines=True)

# newer input file from Chunlei -- preprocessed using following command:
#      gzip -cd smartapi_metakg_20230413.ndjson.gz | jq -c ' ._source' | gzip > smartapi_metakg_20230413b.ndjson.gz
df = pd.read_json('data/smartapi_metakg_20230413b.ndjson.gz', lines=True)
df

Unnamed: 0,subject,object,predicate,provided_by,api,bte
0,Disease,Disease,superclass_of,infores:disease-ontology,"{'name': 'Ontology Lookup Service API', 'smart...",{'query_operation': {'params': {'id': '{{ quer...
1,SequenceVariant,Gene,is_sequence_variant_of,infores:dbsnp,"{'name': 'LitVar API', 'smartapi': {'metadata'...",{'query_operation': {'params': {'variantid': '...
2,GeneOrGeneProduct,ChemicalEntity,has_part,,"{'name': 'Ontology-KP API', 'smartapi': {'meta...","{'query_operation': {'path': '/query', 'method..."
3,NucleicAcidEntity,ChemicalEntity,has_part,,"{'name': 'Ontology-KP API', 'smartapi': {'meta...","{'query_operation': {'path': '/query', 'method..."
4,ChemicalEntity,ChemicalEntity,has_part,,"{'name': 'Ontology-KP API', 'smartapi': {'meta...","{'query_operation': {'path': '/query', 'method..."
...,...,...,...,...,...,...
367201,GeographicLocation,Procedure,derives_from,,{'name': 'ARAX Translator Reasoner - TRAPI 1.3...,"{'query_operation': {'path': '/query', 'method..."
367202,ClinicalIntervention,Activity,related_to,,{'name': 'ARAX Translator Reasoner - TRAPI 1.3...,"{'query_operation': {'path': '/query', 'method..."
367203,Polypeptide,AnatomicalEntity,coexists_with,,{'name': 'ARAX Translator Reasoner - TRAPI 1.3...,"{'query_operation': {'path': '/query', 'method..."
367204,NamedThing,Phenomenon,has_part,,{'name': 'ARAX Translator Reasoner - TRAPI 1.3...,"{'query_operation': {'path': '/query', 'method..."


### Option 2 (preferred) -- Query the SmartAPI API

This method takes longer, but it retrieves the most up-to-date data

In [3]:
c = biothings_client.get_client('metakg', url='https://smart-api.info/api/metakg')
c._query_endpoint=''
a = c.query('*', fetch_all=True)

In [4]:
df = pd.DataFrame(a)
df

Fetching 183941 metakg(s) . . .
No more results to return.


Unnamed: 0,_id,_score,api,object,predicate,subject,provided_by
0,54derYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",Gene,related_to,SmallMolecule,
1,6IderYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",SmallMolecule,related_to,Gene,
2,6YderYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",SmallMolecule,related_to,Disease,
3,6oderYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",SmallMolecule,related_to,Disease,
4,64derYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",BiologicalProcess,related_to,SmallMolecule,
...,...,...,...,...,...,...,...
183936,Z4pgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,Agent,affects,DiseaseOrPhenotypicFeature,
183937,aIpgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,Gene,directly_physically_interacts_with,Gene,
183938,aYpgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,IndividualOrganism,disrupts,MolecularEntity,
183939,aopgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,Activity,occurs_in,GeneticInheritance,


### Post-processing

parse out a couple lines for the API name and ID

In [5]:
df = df.assign(api_name = lambda x: pd.json_normalize(x['api'])['name'])
df = df.assign(api_id = lambda x: pd.json_normalize(x['api'])['smartapi.id'])
df

Unnamed: 0,_id,_score,api,object,predicate,subject,provided_by,api_name,api_id
0,54derYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",Gene,related_to,SmallMolecule,,CTD API,0212611d1c670f9107baf00b77f0889a
1,6IderYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",SmallMolecule,related_to,Gene,,CTD API,0212611d1c670f9107baf00b77f0889a
2,6YderYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",SmallMolecule,related_to,Disease,,CTD API,0212611d1c670f9107baf00b77f0889a
3,6oderYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",SmallMolecule,related_to,Disease,,CTD API,0212611d1c670f9107baf00b77f0889a
4,64derYkBXoKXcforKZub,1.0,"{'name': 'CTD API', 'smartapi': {'id': '021261...",BiologicalProcess,related_to,SmallMolecule,,CTD API,0212611d1c670f9107baf00b77f0889a
...,...,...,...,...,...,...,...,...,...
183936,Z4pgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,Agent,affects,DiseaseOrPhenotypicFeature,,ARAX Translator Reasoner - TRAPI 1.4.0,283042abacfe3c6bcdc924c4a226ff98
183937,aIpgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,Gene,directly_physically_interacts_with,Gene,,ARAX Translator Reasoner - TRAPI 1.4.0,283042abacfe3c6bcdc924c4a226ff98
183938,aYpgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,IndividualOrganism,disrupts,MolecularEntity,,ARAX Translator Reasoner - TRAPI 1.4.0,283042abacfe3c6bcdc924c4a226ff98
183939,aopgrYkBXoKXcforP2oX,1.0,{'name': 'ARAX Translator Reasoner - TRAPI 1.4...,Activity,occurs_in,GeneticInheritance,,ARAX Translator Reasoner - TRAPI 1.4.0,283042abacfe3c6bcdc924c4a226ff98


## Read in the BTE config file that specifies currently-allowed APIs

In [6]:
bte_config_url = "https://raw.githubusercontent.com/biothings/BioThings_Explorer_TRAPI/main/src/config/apis.js"
r = requests.get(bte_config_url)
str_bte_config = r.text
#print(str_bte_config)
str_bte_config = re.sub("exports.API_LIST = ",  "", str_bte_config)                       # remove variable assignment step
str_bte_config = re.sub("\s*//.*",              "", str_bte_config)                       # remove commented lines
str_bte_config = re.sub(r'^$\n',                '', str_bte_config, flags=re.MULTILINE)   # remove blank lines
str_bte_config = re.sub(r',\s*exclude:[^\]]*]', '', str_bte_config, flags=re.MULTILINE)   # remove 'exclude' section
str_bte_config = re.sub(r';$',                  '', str_bte_config, flags=re.MULTILINE)   # remove 'exclude' section
#print(str_bte_config)

bte_config = json5.loads(str_bte_config)['include']
bte_config

[{'id': 'd22b657426375a5295e7da8a303b9893', 'name': 'BioLink API'},
 {'id': '0212611d1c670f9107baf00b77f0889a',
  'name': 'CTD API',
  'primarySource': True},
 {'id': '43af91b3d7cae43591083bff9d75c6dd', 'name': 'EBI Proteins API'},
 {'id': 'dca415f2d792976af9d642b7e73f7a41', 'name': 'LitVar API'},
 {'id': '1f277e1563fcfd124bfae2cc3c4bcdec', 'name': 'QuickGO API'},
 {'id': '1c056ffc7ed0dd1229e71c4752239465',
  'name': 'Ontology Lookup Service API'},
 {'id': '38e9e5169a72aee3659c9ddba956790d', 'name': 'BioThings BindingDB API'},
 {'id': '55a223c6c6e0291dbd05f2faf27d16f4',
  'name': 'BioThings BioPlanet Pathway-Disease API'},
 {'id': 'b99c6dd64abcefe87dcd0a51c249ee6d',
  'name': 'BioThings BioPlanet Pathway-Gene API'},
 {'id': '00fb85fc776279163199e6c50f6ddfc6', 'name': 'BioThings DDInter API'},
 {'id': 'e3edd325c76f2992a111b43a907a4870', 'name': 'BioThings DGIdb API'},
 {'id': 'a7f784626a426d054885a5f33f17d3f8', 'name': 'BioThings DISEASES API'},
 {'id': '1f47552dabd67351d4c625adb0a10d00

In [7]:
bte_config_notrapi = [ item for item in bte_config if not re.search("trapi", item['name'], re.IGNORECASE)]

# temporary hack on next line pending merge of https://github.com/biothings/biothings_explorer/pull/624
bte_config_notrapi = [ item for item in bte_config_notrapi if item['name'] != "Connections Hypothesis Provider API" ]

bte_config_notrapi

[{'id': 'd22b657426375a5295e7da8a303b9893', 'name': 'BioLink API'},
 {'id': '0212611d1c670f9107baf00b77f0889a',
  'name': 'CTD API',
  'primarySource': True},
 {'id': '43af91b3d7cae43591083bff9d75c6dd', 'name': 'EBI Proteins API'},
 {'id': 'dca415f2d792976af9d642b7e73f7a41', 'name': 'LitVar API'},
 {'id': '1f277e1563fcfd124bfae2cc3c4bcdec', 'name': 'QuickGO API'},
 {'id': '1c056ffc7ed0dd1229e71c4752239465',
  'name': 'Ontology Lookup Service API'},
 {'id': '38e9e5169a72aee3659c9ddba956790d', 'name': 'BioThings BindingDB API'},
 {'id': '55a223c6c6e0291dbd05f2faf27d16f4',
  'name': 'BioThings BioPlanet Pathway-Disease API'},
 {'id': 'b99c6dd64abcefe87dcd0a51c249ee6d',
  'name': 'BioThings BioPlanet Pathway-Gene API'},
 {'id': '00fb85fc776279163199e6c50f6ddfc6', 'name': 'BioThings DDInter API'},
 {'id': 'e3edd325c76f2992a111b43a907a4870', 'name': 'BioThings DGIdb API'},
 {'id': 'a7f784626a426d054885a5f33f17d3f8', 'name': 'BioThings DISEASES API'},
 {'id': '1f47552dabd67351d4c625adb0a10d00

In [8]:
# uncomment one of the following two lines to look at all APIs, or just x-bte APIs
#bte_config_ids = [ x['id'] for x in bte_config ]
bte_config_ids = [ x['id'] for x in bte_config_notrapi ]

print(len(bte_config_ids))
print(bte_config_ids)

34
['d22b657426375a5295e7da8a303b9893', '0212611d1c670f9107baf00b77f0889a', '43af91b3d7cae43591083bff9d75c6dd', 'dca415f2d792976af9d642b7e73f7a41', '1f277e1563fcfd124bfae2cc3c4bcdec', '1c056ffc7ed0dd1229e71c4752239465', '38e9e5169a72aee3659c9ddba956790d', '55a223c6c6e0291dbd05f2faf27d16f4', 'b99c6dd64abcefe87dcd0a51c249ee6d', '00fb85fc776279163199e6c50f6ddfc6', 'e3edd325c76f2992a111b43a907a4870', 'a7f784626a426d054885a5f33f17d3f8', '1f47552dabd67351d4c625adb0a10d00', 'cc857d5b7c8b7609b5bbb38ff990bfff', 'f339b28426e7bf72028f60feefcd7465', '34bad236d77bea0a0ee6c6cba5be54a6', '316eab811fd9ef1097df98bcaa9f7361', 'a5b0ec6bfde5008984d4b6cde402d61f', '32f36164fabed5d3abe6c2fd899c9418', '77ed27f111262d0289ed4f4071faa619', 'edeb26858bd27d0322af93e7a9e08761', 'b772ebfbfa536bba37764d7fddb11d6f', '03283cc2b21c077be6794e1704b1d230', '1d288b3a3caf75d541ffaae3aab386c8', 'ec6d76016ef40f284359d17fbf78df20', '8f08d1446e0bb9c2b323713ce83e2bd3', '671b45c0301c8624abbd26ae78449ca2', '59dce17363dce279d389100

## Join SmartAPI data with BTE config IDs

In [9]:
df_bte = df.query('api_id in @bte_config_ids')[['subject','object','predicate','api_name','api_id']].drop_duplicates().sort_values(['api_name','subject','object','predicate'])
df_bte

Unnamed: 0,subject,object,predicate,api_name,api_id
161530,Disease,Gene,caused_by,BioLink API,d22b657426375a5295e7da8a303b9893
161531,Disease,Gene,condition_associated_with_gene,BioLink API,d22b657426375a5295e7da8a303b9893
161532,Disease,Pathway,actively_involves,BioLink API,d22b657426375a5295e7da8a303b9893
161533,Disease,PhenotypicFeature,has_phenotype,BioLink API,d22b657426375a5295e7da8a303b9893
161534,Disease,SequenceVariant,contribution_from,BioLink API,d22b657426375a5295e7da8a303b9893
...,...,...,...,...,...
161568,SmallMolecule,Disease,contributes_to,Text Mining Targeted Association API,978fe380a147a8641caf72320862697b
161550,SmallMolecule,Disease,treats,Text Mining Targeted Association API,978fe380a147a8641caf72320862697b
161554,SmallMolecule,Gene,affects,Text Mining Targeted Association API,978fe380a147a8641caf72320862697b
161566,SmallMolecule,PhenotypicFeature,contributes_to,Text Mining Targeted Association API,978fe380a147a8641caf72320862697b


In [10]:
df_bte.to_csv("results/bte_operations.tsv", sep="\t", index=False)

Note: `bte_operations.tsv` is included in the paper as a Supplemental Table

In [11]:
df_bte['api_name'].value_counts()

BioThings SEMMEDDB API                     826
BioLink API                                 21
Text Mining Targeted Association API        20
MyDisease.info API                          17
Multiomics EHR Risk KP API                  14
Multiomics BigGIM-DrugResponse KP API       14
MyChem.info API                             12
MyGene.info API                             11
MyVariant.info API                          10
BioThings UBERON API                         9
Multiomics Wellness KP API                   9
CTD API                                      8
BioThings IDISK API                          7
BioThings PFOCR API                          6
BioThings GO Biological Process API          6
BioThings DGIdb API                          6
BioThings GO Molecular Function API          4
BioThings MGIgene2phenotype API              4
BioThings GO Cellular Component API          4
Multiomics ClinicalTrials KP                 2
BioThings BioPlanet Pathway-Gene API         2
BioThings Bio

## Summarization

### by subject, object; count # of APIs

In [12]:
df1 = df_bte[["subject","object","api_name"]].drop_duplicates()
df1

Unnamed: 0,subject,object,api_name
161530,Disease,Gene,BioLink API
161532,Disease,Pathway,BioLink API
161533,Disease,PhenotypicFeature,BioLink API
161534,Disease,SequenceVariant,BioLink API
161536,Gene,Disease,BioLink API
...,...,...,...
161573,PhenotypicFeature,Gene,Text Mining Targeted Association API
161567,PhenotypicFeature,SmallMolecule,Text Mining Targeted Association API
161568,SmallMolecule,Disease,Text Mining Targeted Association API
161554,SmallMolecule,Gene,Text Mining Targeted Association API


In [13]:
api_stats = df1.groupby(['subject','object'], group_keys=False)['api_name'].nunique().rename("count").to_frame()
api_stats['list'] = df1.groupby(['subject','object'], group_keys=False)['api_name'].unique().apply(list)
api_stats = api_stats.reset_index().sort_values(by=['count'],ascending=False)

api_stats.head(15)

Unnamed: 0,subject,object,count,list
50,Disease,Gene,13,"[BioLink API, BioThings DISEASES API, BioThing..."
72,Gene,Disease,12,"[BioLink API, BioThings DISEASES API, BioThing..."
61,Disease,SmallMolecule,9,"[BioThings GTRx API, BioThings IDISK API, BioT..."
197,SmallMolecule,Gene,8,"[BioThings BindingDB API, BioThings DGIdb API,..."
83,Gene,SmallMolecule,8,"[BioThings BindingDB API, BioThings DGIdb API,..."
196,SmallMolecule,Disease,8,"[BioThings GTRx API, BioThings IDISK API, BioT..."
73,Gene,Gene,5,"[BioLink API, BioThings SEMMEDDB API, Multiomi..."
135,PhenotypicFeature,SmallMolecule,4,"[BioThings IDISK API, BioThings SEMMEDDB API, ..."
55,Disease,PhenotypicFeature,4,"[BioLink API, BioThings SEMMEDDB API, Multiomi..."
78,Gene,PhenotypicFeature,4,"[BioLink API, BioThings MGIgene2phenotype API,..."


In [14]:
api_stats.to_csv("results/api_stats.tsv", sep="\t", index=False)


### by subject, object; count # of predicates

In [15]:
df1 = df_bte[["subject","object","predicate"]].drop_duplicates()
predicate_stats = df1.groupby(['subject','object'], group_keys=False)['predicate'].nunique().rename("count").to_frame()
predicate_stats['list'] = df1.groupby(['subject','object'], group_keys=False)['predicate'].unique().apply(list)
predicate_stats = predicate_stats.reset_index().sort_values(by=['count'],ascending=False)

predicate_stats.head(15)

Unnamed: 0,subject,object,count,list
47,Disease,Disease,26,"[affected_by, affects, caused_by, causes, coex..."
53,Disease,PathologicalProcess,15,"[affected_by, affects, caused_by, causes, coex..."
73,Gene,Gene,15,"[interacts_with, affected_by, affects, coexist..."
61,Disease,SmallMolecule,15,"[treated_by, adverse_event_of, occurs_together..."
111,PathologicalProcess,Disease,15,"[affected_by, affects, caused_by, causes, coex..."
196,SmallMolecule,Disease,15,"[treats, has_adverse_event, occurs_together_in..."
50,Disease,Gene,13,"[caused_by, condition_associated_with_gene, re..."
197,SmallMolecule,Gene,13,"[physically_interacts_with, affects, interacts..."
72,Gene,Disease,13,"[causes, gene_associated_with_condition, relat..."
83,Gene,SmallMolecule,13,"[physically_interacts_with, affected_by, inter..."


In [16]:
predicate_stats.to_csv("results/predicate_stats.tsv", sep="\t", index=False)

## Filter by most common types

Filter to only include the most common types of entities.  Also, since we _mostly_ have the same info in both directions, only keep one direction to simplify visualization

In [17]:
pd.concat([df_bte['subject'], df_bte['object']]).value_counts().head(20)

Disease                     325
Gene                        256
SmallMolecule               250
Polypeptide                 182
PathologicalProcess         174
ChemicalEntity              132
PhenotypicFeature           127
PhysiologicalProcess        124
Procedure                    80
Cell                         72
CellularComponent            71
GrossAnatomicalStructure     70
MolecularActivity            62
Protein                      46
AnatomicalEntity             18
Drug                         16
SequenceVariant              15
BiologicalProcess            15
Food                         14
Pathway                       9
dtype: int64

In [18]:
NUM_TYPES_TO_KEEP = 8

keep = set(pd.concat([df_bte['subject'], df_bte['object']]).value_counts().head(NUM_TYPES_TO_KEEP).keys())
keep

{'ChemicalEntity',
 'Disease',
 'Gene',
 'PathologicalProcess',
 'PhenotypicFeature',
 'PhysiologicalProcess',
 'Polypeptide',
 'SmallMolecule'}

In [19]:
predicate_stats_filt = predicate_stats.query("subject in @keep & object in @keep & subject <= object").sort_values(by=['count'], ascending=False)
predicate_stats_filt.to_csv("results/predicate_stats_filt.tsv", sep="\t")
predicate_stats_filt

Unnamed: 0,subject,object,count,list
47,Disease,Disease,26,"[affected_by, affects, caused_by, causes, coex..."
73,Gene,Gene,15,"[interacts_with, affected_by, affects, coexist..."
61,Disease,SmallMolecule,15,"[treated_by, adverse_event_of, occurs_together..."
53,Disease,PathologicalProcess,15,"[affected_by, affects, caused_by, causes, coex..."
50,Disease,Gene,13,"[caused_by, condition_associated_with_gene, re..."
83,Gene,SmallMolecule,13,"[physically_interacts_with, affected_by, inter..."
206,SmallMolecule,SmallMolecule,11,"[interacts_with, affected_by, affects, coexist..."
116,PathologicalProcess,PathologicalProcess,11,"[affected_by, affects, caused_by, causes, coex..."
161,Polypeptide,Polypeptide,10,"[affected_by, affects, coexists_with, derives_..."
135,PhenotypicFeature,SmallMolecule,10,"[adverse_event_of, affected_by, caused_by, dis..."


In [20]:
api_stats_filt = api_stats.query("subject in @keep & object in @keep & subject <= object").sort_values(by=['count'], ascending=False)
api_stats_filt.to_csv("results/api_stats_filt.tsv", sep="\t")
api_stats_filt

Unnamed: 0,subject,object,count,list
50,Disease,Gene,13,"[BioLink API, BioThings DISEASES API, BioThing..."
61,Disease,SmallMolecule,9,"[BioThings GTRx API, BioThings IDISK API, BioT..."
83,Gene,SmallMolecule,8,"[BioThings BindingDB API, BioThings DGIdb API,..."
73,Gene,Gene,5,"[BioLink API, BioThings SEMMEDDB API, Multiomi..."
135,PhenotypicFeature,SmallMolecule,4,"[BioThings IDISK API, BioThings SEMMEDDB API, ..."
55,Disease,PhenotypicFeature,4,"[BioLink API, BioThings SEMMEDDB API, Multiomi..."
78,Gene,PhenotypicFeature,4,"[BioLink API, BioThings MGIgene2phenotype API,..."
47,Disease,Disease,4,"[BioThings SEMMEDDB API, Multiomics EHR Risk K..."
206,SmallMolecule,SmallMolecule,4,"[BioThings DDInter API, BioThings IDISK API, B..."
130,PhenotypicFeature,PhenotypicFeature,3,"[BioThings HPO API, BioThings SEMMEDDB API, Mu..."


## Export to graphml

In [21]:
def create_graph(df2, filename):
    G = nx.Graph()

    node_types = set(pd.concat([df2['subject'], df2['object']]))
        
    for node_type in node_types:
        G.add_node(node_type, label = add_spacing(node_type))

    for index,row in df2.iterrows():
        if 'count' in row.keys():
            G.add_edge(row['subject'], row['object'], weight=row['count'])
        else:
            G.add_edge(row['subject'], row['object'])
    
    nx.write_graphml(G, filename, infer_numeric_types=True)
    return(G)

In [22]:
def add_spacing(str):
    key = {
        "BiologicalProcess":               "Biological\nProcess",
        "ChemicalEntity":                  "Chemical\nEntity",
        "MolecularMixture":                "Molecular\nMixture", 
        "PhysiologicalProcess":            "Physiological\nProcess",
        "SmallMolecule":                   "Small\nMolecule",
        "PhenotypicFeature":               "Phenotypic\nFeature",
        'ChemicalExposure':                'Chemical\nExposure',
        'ClinicalAttribute':               'Clinical\nAttribute',
        'ClinicalIntervention':            'Clinical\nIntervention',
        'ComplexMolecularMixture':         'Complex\nMolecular\nMixture',
        'EnvironmentalExposure':           'Environmental\nExposure',
        'InformationContentEntity':        'Information\nContentEntity',
        'MolecularMixture':                'Molecular\nMixture',
        'PhysiologicalProcess':            'Physiological\nProcess',
        'PopulationOfIndividualOrganisms': 'PopulationOf\nIndividualOrganisms',
        'GrossAnatomicalStructure':        'Gross\nAnatomical\nStructure',
        'PathologicalProcess':             'Pathological\nProcess'
    }
    if str in key.keys():
        return(key[str])
    else:
        return(str)

In [23]:
g = create_graph(api_stats_filt, "results/api_stats_filt.graphml")
g = create_graph(predicate_stats_filt, "results/predicate_stats_filt.graphml")

In [24]:
g.edges


EdgeView([('Gene', 'Gene'), ('Gene', 'Disease'), ('Gene', 'SmallMolecule'), ('Gene', 'Polypeptide'), ('Gene', 'PhenotypicFeature'), ('Gene', 'ChemicalEntity'), ('Gene', 'PathologicalProcess'), ('Gene', 'PhysiologicalProcess'), ('Polypeptide', 'Polypeptide'), ('Polypeptide', 'Disease'), ('Polypeptide', 'SmallMolecule'), ('Polypeptide', 'PathologicalProcess'), ('Polypeptide', 'ChemicalEntity'), ('Polypeptide', 'PhenotypicFeature'), ('Polypeptide', 'PhysiologicalProcess'), ('SmallMolecule', 'Disease'), ('SmallMolecule', 'SmallMolecule'), ('SmallMolecule', 'PhenotypicFeature'), ('SmallMolecule', 'PathologicalProcess'), ('SmallMolecule', 'PhysiologicalProcess'), ('SmallMolecule', 'ChemicalEntity'), ('PhysiologicalProcess', 'Disease'), ('PhysiologicalProcess', 'PhysiologicalProcess'), ('PhysiologicalProcess', 'PathologicalProcess'), ('PhysiologicalProcess', 'ChemicalEntity'), ('PhysiologicalProcess', 'PhenotypicFeature'), ('PhenotypicFeature', 'Disease'), ('PhenotypicFeature', 'PhenotypicFea

In [25]:
G = nx.Graph()
G.all_simple_paths(g, source="ChemicalEntity", target="Disease")


AttributeError: 'Graph' object has no attribute 'all_simple_paths'

## Restricted subway diagram

Ultimately wasn't able to get a usable figure out of this, but keeping this section in case I want to experiment with this again later.

In [None]:
NUM_TYPES_TO_KEEP = 5

keep = set(pd.concat([df_bte['subject'], df_bte['object']]).value_counts().head(NUM_TYPES_TO_KEEP).keys())
keep

In [None]:
df_subway = df_bte.query('object in @keep & subject in @keep & subject < object').drop(columns=['predicate']).drop_duplicates()
df_subway

In [None]:
G = nx.MultiGraph()

df2 = df_subway

node_types = set(pd.concat([df2['subject'], df2['object']]))
        
for node_type in node_types:
    G.add_node(node_type, label = add_spacing(node_type))

In [None]:
idcount = 0
for index,row in df2.iterrows():
    G.add_edge(row['subject'], row['object'], api_name=row['api_name'], key=idcount)
    idcount = idcount + 1


In [None]:
nx.write_graphml(G, "results/subway.graphml", infer_numeric_types=True,named_key_ids=True)