**TODOs**
- [ ] The trained SciBERT model `scibert_chemprot.tar.gz` stores inside itself 
  absolute paths to vocabulary text and weights! So it cannot be move around
  without rewriting its metadata inside.
- [ ] SciBERT can not be obtained with `pip install`, so currently one needs to 
    1. `git clone https://github.com/allenai/scibert.git`
    2. `export PYTHONPATH=$PYTHONPATH:PATH_TO_SCIBERT`


# Goal of the notebook
(to be completed)

In [None]:
from collections import OrderedDict
import logging
from pathlib import Path

import IPython
import ipywidgets
import pandas as pd
import scispacy
import spacy

from bbsearch.data import AllData
from bbsearch.embedding_models import EmbeddingModels
from bbsearch.mining import ChemProt, run_pipeline
from bbsearch.precomputed_embeddings import PrecomputedEmbeddings
from bbsearch.widget import Widget

# Set a Project
The User choses/creates a project to host a KG.

# Set topic
The user defines its topic.

# Data Import
The user loads data from a data source (CORD-19).
The loaded data forms the corpus.
The user searches the CORPUS in Blue Brain Search.

In [None]:
import nltk
nltk.download("punkt")
nltk.download("stopwords")

In [None]:
DATASET_VERSION = 'v7'

data_path = Path("/raid/covid_data/data/") / DATASET_VERSION
assets_path = Path("/raid/covid_data/assets")
embeddings_path = data_path / "embeddings"

models_to_load=["SBIOBERT", "BSV"]

In [None]:
all_data = AllData(data_path)
embedding_models = EmbeddingModels(assets_path, models_to_load)
precomputed_embeddings = PrecomputedEmbeddings(embeddings_path, models_to_load)

In [None]:
bbs_widget = Widget(all_data, embedding_models, precomputed_embeddings)
bbs_widget.display()

# Set schemas
The user defines the KG schema.

# Create a knowledge graph according to schemas
The user extracts data from the text of a set of papers using selected Named Entity Recognizers and Relation Extractors from Blue Brain Search.
The user can preview the extracted data.
The user curates extracted data.
The user links the extracted entities and relations to ontologies.
The user saves data into Knowledge Graph.

- **input**: raw text
- **output**: csv table of extracted entities/relations

In [None]:
DEFAULT_TEXT = """Autophagy maintains tumour growth through circulating
arginine. Autophagy captures intracellular components and delivers them to
lysosomes, where they are degraded and recycled to sustain metabolism and to
enable survival during starvation. Acute, whole-body deletion of the essential 
autophagy gene Atg7 in adult mice causes a systemic metabolic defect that 
manifests as starvation intolerance and gradual loss of white adipose tissue, 
liver glycogen and muscle mass.  Cancer cells also benefit from autophagy. 
Deletion of essential autophagy genes impairs the metabolism, proliferation, 
survival and malignancy of spontaneous tumours in models of autochthonous 
cancer. Acute, systemic deletion of Atg7 or acute, systemic expression of a 
dominant-negative ATG4b in mice induces greater regression of KRAS-driven 
cancers than does tumour-specific autophagy deletion, which suggests that host 
autophagy promotes tumour growth.
""".replace('\n', ' ').replace('  ', ' ')

In [None]:
# Entities Extractors (EE)
ee_model = spacy.load("en_ner_craft_md")

# Relations Extractors (RE)
PATH_ASSETS = Path('/raid/covid_data/assets')
PATH_CHEMPROT_TRAINED_MODEL = PATH_ASSETS / 'scibert_chemprot.tar.gz'
re_models = {('CHEBI', 'GGP'): [ChemProt(PATH_CHEMPROT_TRAINED_MODEL)]}

In [None]:
# This is the output: csv table of extracted entities/relations.
table_extractions = None

In [None]:
# Define Widgets
bbs_widgets = OrderedDict()

# "Input Text" Widget
bbs_widgets['input_text'] = ipywidgets.Textarea(
        value=DEFAULT_TEXT,
        layout=ipywidgets.Layout(width='75%', height='300px')
    )

# "Submit!" Button
bbs_widgets['submit_button'] = ipywidgets.Button(
    description='Extract Entities & Properties!',
    layout=ipywidgets.Layout(width='30%')
)
def cb(b):
    global table_extractions
    bbs_widgets['out'].clear_output()
    with bbs_widgets['out']:
        text = bbs_widgets['input_text'].value
        table_extractions = run_pipeline(text, ee_model, re_models)
        display(table_extractions)
bbs_widgets['submit_button'].on_click(cb)

# "Output Area" Widget
bbs_widgets['out'] = ipywidgets.Output(layout={'border': '0.5px solid black'})

# Finalize Widgets
ordered_widgets = list(bbs_widgets.values())
main_widget = ipywidgets.VBox(ordered_widgets)
IPython.display.display(main_widget)

- **input**: csv table of extracted entities/relations
- **output**: knowledge graph

In [None]:
# This returns the dimensionality of the extracted data.
table_extractions.shape

# Validate the knowledge graph
Thee User reviews content of Knowledge Graph.

# Correct knowledge graph
The correct the Knowledge Graph is errors occur.

# Access the knowledge graph
The user can search, visualize, and export the knowledge graph.

# Version the knowledge graph
The user can save a knowledge graph with a version.