**TODOs**
- [ ] The trained SciBERT model `scibert_chemprot.tar.gz` stores inside itself 
  absolute paths to vocabulary text and weights! So it cannot be move around
  without rewriting its metadata inside.
- [ ] SciBERT can not be obtained with `pip install`, so currently one needs to 
    1. `git clone https://github.com/allenai/scibert.git`
    2. `export PYTHONPATH=$PYTHONPATH:PATH_TO_SCIBERT`


In [None]:
from collections import OrderedDict
from pathlib import Path

import IPython
import ipywidgets
import pandas as pd
import scispacy
import spacy

from bbsearch.mining import ChemProt, run_pipeline

# BlueBrainSearch to BlueBrainGraph: POC

This notebook shows how from raw text we can apply BlueBrainSearch and then BlueBrainGraph tools in order to generate first a list of extracted objects of interest, and then a knowledge graph out of it.

It is intended to be just a proof of concept of the pipeline.

## BlueBrainSearch

This first part of the pipeline starts with the raw text of a scientific paper as an input, and generates a CSV table out of it. The table contains all the extracted entities and relations that were identified in the text.
- **input**: raw text
- **output**: csv table of extracted entities/relations

In [None]:
DEFAULT_TEXT = """Autophagy maintains tumour growth through circulating
arginine. Autophagy captures intracellular components and delivers them to
lysosomes, where they are degraded and recycled to sustain metabolism and to
enable survival during starvation. Acute, whole-body deletion of the essential 
autophagy gene Atg7 in adult mice causes a systemic metabolic defect that 
manifests as starvation intolerance and gradual loss of white adipose tissue, 
liver glycogen and muscle mass.  Cancer cells also benefit from autophagy. 
Deletion of essential autophagy genes impairs the metabolism, proliferation, 
survival and malignancy of spontaneous tumours in models of autochthonous 
cancer. Acute, systemic deletion of Atg7 or acute, systemic expression of a 
dominant-negative ATG4b in mice induces greater regression of KRAS-driven 
cancers than does tumour-specific autophagy deletion, which suggests that host 
autophagy promotes tumour growth.
""".replace('\n', ' ').replace('  ', ' ')

In [None]:
# Entities Extractors (EE)
ee_model = spacy.load("en_ner_craft_md")

# Relations Extractors (RE)
PATH_ASSETS = Path('/raid/covid_data/assets')
PATH_CHEMPROT_TRAINED_MODEL = PATH_ASSETS / 'scibert_chemprot.tar.gz'
re_models = {('CHEBI', 'GGP'): [ChemProt(PATH_CHEMPROT_TRAINED_MODEL)]}

In [None]:
# Define Widgets
bbs_widgets = OrderedDict()

# "Input Text" Widget
bbs_widgets['input_text'] = ipywidgets.Textarea(
        value=DEFAULT_TEXT,
        layout=ipywidgets.Layout(width='75%', height='300px')
    )

# "Submit!" Button
bbs_widgets['submit_button'] = ipywidgets.Button(
    description='Extract Entities & Properties!',
    layout=ipywidgets.Layout(width='30%')
)
def cb(b):
    bbs_widgets['out'].clear_output()
    with bbs_widgets['out']:
        text = bbs_widgets['input_text'].value
        table_extractions = run_pipeline(text, ee_model, re_models, return_prob=True)
        display(table_extractions)
bbs_widgets['submit_button'].on_click(cb)

# "Output Area" Widget
bbs_widgets['out'] = ipywidgets.Output(layout={'border': '0.5px solid black'})

# Finalize Widgets
ordered_widgets = list(bbs_widgets.values())
main_widget = ipywidgets.VBox(ordered_widgets)
IPython.display.display(main_widget)

## BlueBrainGraph

This second part of the pipeline starts with the extracted entities and relations in a CSV table, and generates a knowledge graph out of it.

- **input**: csv table of extracted entities/relations
- **output**: knowledge graph