# Nexus-hosted co-occurrence network analysis of CORD-19

In this notebook we will illustrate how different datasets from the co-occurrence analysis can be pushed and accessed within a Blue Brain Nexus project. To get

In [1]:
import getpass
import jwt
import nexussdk as nexus
import yaml

from kgforge.core import KnowledgeGraphForge

from cord19kg.apps.topic_widgets import (TopicWidget, DataSaverWidget)

## 1. Setting up a project and creating a `kgforge` configuration

If you have already set up a Nexus project and generated a 'forge' configuration, simply get your access token (__1.1__) and go directly to the step __2. Set up a topic__.

### 1.1. Login to Nexus and get the access token

The [Nexus web application](https://sandbox.bluebrainnexus.io/web) can be used to login and get a token:

1. Click on the login button on the right corner and follow the instructions.
<img src="../figures/nexus_log_in.png" alt="Drawing" style="width: 1000px;"/>


2. Once logged in, click on the `Copy token` button. The token will be copied to the clipboard.
<img src="../figures/nexus_logged_in.png" alt="Drawing" style="width: 1000px;"/>


Run the cell below and paste the token in the input field generated by the cell.

In [2]:
TOKEN = getpass.getpass()

········


### 1.2. Create a Nexus project programmatically

In the cell below modify the variable `project` to chose a new project name.

In [None]:
org ="tutorialnexus"
project ="cord19kgExampleProject"  # Choose a project name
description = "cord19kg save/load example project"
endpoint = "https://sandbox.bluebrainnexus.io/v1"

nexus.config.set_environment(endpoint)
nexus.config.set_token(TOKEN)

# nexus.projects.create(org_label=org,project_label=project, description=description)

### 1.3. Generate a `kgforge` configuration for your project

The following cell automatically generates a configuration file for the project set up above.

In [None]:
config = dict()

neuroshapes_path = "../models/neuroshapes"
! rm -Rf $neuroshapes_path
! git clone https://github.com/INCF/neuroshapes.git $neuroshapes_path
! cp -R $neuroshapes_path/shapes/neurosciencegraph/datashapes/core/dataset $neuroshapes_path/shapes/neurosciencegraph/commons/
! cp -R $neuroshapes_path/shapes/neurosciencegraph/datashapes/core/activity $neuroshapes_path/shapes/neurosciencegraph/commons/
! cp -R $neuroshapes_path/shapes/neurosciencegraph/datashapes/core/entity $neuroshapes_path/shapes/neurosciencegraph/commons/
! cp -R $neuroshapes_path/shapes/neurosciencegraph/datashapes/core/ontology $neuroshapes_path/shapes/neurosciencegraph/commons/
! cp -R $neuroshapes_path/shapes/neurosciencegraph/datashapes/core/person $neuroshapes_path/shapes/neurosciencegraph/commons/

config['Model'] = {
    "name": "RdfModel",
    "origin": "directory",
    "source": f"{neuroshapes_path}/shapes/neurosciencegraph/commons/",
    "context": {
        "iri": "../models/neuroshapes_context.json",
    },
}

config["Store"] = {
    "name": "BlueBrainNexus",
    "endpoint": endpoint,
    "searchendpoints":{
        "sparql":{
            "endpoint":"https://bluebrain.github.io/nexus/vocabulary/defaultSparqlIndex"
        }
    },
    "bucket": f"{org}/{project}",
    "versioned_id_template": "{x.id}?rev={x._store_metadata._rev}",
    "file_resource_mapping": "../config/file-to-resource-mapping.hjson"
}

with open("../config/forge-config.yml", "w") as f:
    yaml.dump(config, f)

## 2. Set up a topic

Create a 'forge' to manage (create, access and deploy) knowledge within the given Blue Brain Nexus Project.

In [24]:
forge_config_file = "../config/forge-config.yml"
forge = KnowledgeGraphForge(forge_config_file, token=TOKEN, debug=True)

In [25]:
agent_username = jwt.decode(TOKEN, verify=False)['preferred_username']

In [26]:
agent_username

'eugeniashurko'

In [27]:
widget = TopicWidget(forge, agent_username)
widget.display()

Tab(children=(VBox(children=(HBox(children=(Button(description='🔬 List all your topics', layout=Layout(height=…

<action> _register_one
<succeeded> False
<error> ValueError: 'str' object has no attribute 'items'
<action> _register_one
<succeeded> False
<error> ValueError: 'str' object has no attribute 'items'


In [8]:
import json
from cord19kg.utils import resolve_taxonomy_to_types

In [13]:
import pickle
import pandas as pd

In [9]:
with open("../data/NCIT_type_mapping.json", "rb") as f:
    type_mapping = json.load(f)

In [14]:
data = pd.read_pickle("/Users/oshurko/Downloads/cord_47_occurrence_data_linked.pkl")

KeyboardInterrupt: 

In [None]:
data["paper_frequency"] = data["paper"].apply(len)

In [None]:
data = data.nlargest(10000, columns=["paper_frequency"])

In [11]:
types = resolve_taxonomy_to_types(data, type_mapping)

NameError: name 'data' is not defined

In [24]:
# data.to_csv("/Users/oshurko/Downloads/cord_47_occurrence_data_linked.csv", index=False)

In [22]:
# top10000.to_csv("/Users/oshurko/Downloads/cord_47_occurrence_data_linked_10000.csv", index=False)

Unnamed: 0_level_0,aggregated_entities,raw_entity_types,paragraph,paper,section,uid,definition,semantic_type,taxonomy,paper_frequency
entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
covid-19,"[(-)covid-19, (2019-ncov infection, 2019-cov i...","[PROTEIN, PROTEIN, DISEASE, DISEASE, DISEASE, ...","{196797:Title:0, 196486:Caption:41, 224080:Con...","{152649, 140629, 208950, 176798, 157625, 85783...","{190703:Patients Categorization , 224510:Treat...",http://purl.obolibrary.org/obo/NCIT_C171133,An acute infection of the respiratory tract th...,Disease or Syndrome,"[(http://purl.obolibrary.org/obo/NCIT_C3439, V...",110145
virus,"[(vogel)virus, -virus, 229e viruses, 5-virus, ...","[ORGANISM, ORGANISM, ORGANISM, ORGANISM, ORGAN...","{200885:Viruses ::: Infectious Etiologies:13, ...","{161821, 176798, 157625, 186816, 113857, 17128...","{188478:Conclusions, 193751:Statistical Analys...",http://purl.obolibrary.org/obo/NCIT_C14283,An infectious agent which consists of two part...,Virus,"[(http://purl.obolibrary.org/obo/NCIT_C14250, ...",75012
infectious disorder,"[' infection, 's infection, 's infections, * i...","[DISEASE, DISEASE, DISEASE, DISEASE, DISEASE, ...","{195062:Discussion:15, 200885:Viruses ::: Infe...","{161821, 176798, 184395, 186816, 171281, 17518...","{188478:Conclusions, 119364:The Population Bal...",http://purl.obolibrary.org/obo/NCIT_C26726,A disorder resulting from the presence and act...,Disease or Syndrome,"[(http://purl.obolibrary.org/obo/NCIT_C93210, ...",73574
coronavirus,"[@coronavirus, c24coronavirus, coronavirus, co...","[ORGANISM, ORGANISM, ORGANISM, ORGANISM, ORGAN...",{80683:(Which Was Not Certified By Peer Review...,"{152649, 176798, 184395, 186816, 113857, 17717...","{203505:Simultaneous Regression Analysis , 188...",http://purl.obolibrary.org/obo/NCIT_C26431,"A genus of single-stranded, positive-sense RNA...",Virus,"[(http://purl.obolibrary.org/obo/NCIT_C113205,...",67945
human,"[-human, ahuman, antihuman, antihuman, bnhuman...","[ORGANISM, CELL_TYPE, PROTEIN, PROTEIN, PROTEI...","{195062:Discussion:15, 20447:Common Routes Of ...","{161821, 184395, 186816, 185830, 34420, 227659...","{169065:Introduction, 177571:Caption, 163666:D...",http://purl.obolibrary.org/obo/NCIT_C14225,"The bipedal primate mammal, Homo sapiens; belo...",Human,"[(http://purl.obolibrary.org/obo/NCIT_C79740, ...",61816
...,...,...,...,...,...,...,...,...,...,...
"postnasal drip, ctcae",[postnasal],"[DISEASE, DISEASE, DISEASE, DISEASE, DISEASE, ...",{22433:Acute Sinusitis ::: Infections ::: Nona...,"{29906, 15996, 206328, 20831, 49596, 188949, 1...","{26789:History, 196118:Nasal Saline Irrigation...",http://purl.obolibrary.org/obo/NCIT_C143771,A disorder characterized by excessive mucous s...,Finding,"[(http://purl.obolibrary.org/obo/NCIT_C143181,...",119
refractory anemia,"[chronic nonregenerative anemia, hyporegenerat...","[DISEASE, DISEASE, DISEASE, DISEASE, DISEASE, ...","{183918:Case Description:6, 22212:Causes And P...","{20580, 13218, 183436, 209998, 173575, 168370,...","{173575:Portosystemic Shunts, 182486:Potential...",http://purl.obolibrary.org/obo/NCIT_C2872,A myelodysplastic syndrome characterized mainl...,Neoplastic Process,"[(http://purl.obolibrary.org/obo/NCIT_C82591, ...",119
renal abscess,"[adrenal abscess, adrenal abscesses, anorectal...","[DISEASE, DISEASE, DISEASE, DISEASE, DISEASE, ...","{7101:S316:303, 14862:Caption:47, 24175:Pleuri...","{14051, 206328, 14602, 72102, 7106, 14355, 175...","{104064:Abstract, 21308:Caption, 14862:Caption...",http://purl.obolibrary.org/obo/NCIT_C123017,An abscess that is located within the renal pa...,Finding,"[(http://purl.obolibrary.org/obo/NCIT_C26686, ...",119
resolvin,"[at-resolvins, exogenous resolvins, non-resolv...","[PROTEIN, PROTEIN, PROTEIN, PROTEIN, DISEASE, ...",{7003:Alveolar Macrophages And Phagocytosis Of...,"{14375, 187672, 221305, 3835, 168718, 177819, ...","{6116:University Of Sao Paulo, Sao Paulo, Braz...",http://purl.obolibrary.org/obo/NCIT_C126428,Dihydroxy or trihydroxy polyunsaturated fatty ...,Organic Chemical,"[(http://purl.obolibrary.org/obo/NCIT_C492, Fa...",119
