# Minimal working example

### Goal

Given a sentence (for simplicity), find CHEMICAL/DRUG and PROTEIN/GENE entities. If there is exactly one of each predict their relationship.

### Resources
- relationship types (slide 19): https://biocreative.bioinformatics.udel.edu/media/store/files/2017/chemprot_overview_v03.pdf 


### Relationship classes
- Upragulator
- Downregulator
- Agonist
- Antagonist
- Substrate

In [None]:
inp = """
Autophagy maintains tumour growth through circulating arginine. Autophagy captures intracellular components and delivers them to lysosomes, where they are degraded and recycled to sustain metabolism and to enable survival during starvation1-5. Acute, whole-body deletion of the essential autophagy gene Atg7 in adult mice causes a systemic metabolic defect that manifests as starvation intolerance and gradual loss of white adipose tissue, liver glycogen and muscle mass1. Cancer cells also benefit from autophagy. Deletion of essential autophagy genes impairs the metabolism, proliferation, survival and malignancy of spontaneous tumours in models of autochthonous cancer6,7. Acute, systemic deletion of Atg7 or acute, systemic expression of a dominant-negative ATG4b in mice induces greater regression of KRAS-driven cancers than does tumour-specific autophagy deletion, which suggests that host autophagy promotes tumour growth1,8. Here we show that host-specific deletion of Atg7 impairs the growth of multiple allografted tumours, although not all tumour lines were sensitive to host autophagy status. Loss of autophagy in the host was associated with a reduction in circulating arginine, and the sensitive tumour cell lines were arginine auxotrophs owing to the lack of expression of the enzyme argininosuccinate synthase 1. Serum proteomic analysis identified the arginine-degrading enzyme arginase I (ARG1) in the circulation of Atg7-deficient hosts, and in vivo arginine metabolic tracing demonstrated that serum arginine was degraded to ornithine. ARG1 is predominantly expressed in the liver and can be released from hepatocytes into the circulation. Liver-specific deletion of Atg7 produced circulating ARG1, and reduced both serum arginine and tumour growth. Deletion of Atg5 in the host similarly regulated [corrected] circulating arginine and suppressed tumorigenesis, which demonstrates that this phenotype is specific to autophagy function rather than to deletion of Atg7. Dietary supplementation of Atg7-deficient hosts with arginine partially restored levels of circulating arginine and tumour growth. Thus, defective autophagy in the host leads to the release of ARG1 from the liver and the degradation of circulating arginine, which is essential for tumour growth; this identifies a metabolic vulnerability of cancer. (PMID:30429607)
"""

In [None]:
# entities.csv
# id, entity_type, entity, start_pos, end_pos
# 1, chemical, chemical_1, 0, 9
# 2, protein, protein_b, 24, 32


In [None]:
# relationships.csv
# id, subject_id, object_id, upregulator, downregulator, agonist, antagonist, substrate
# 1, 1, 2, 0.2, 0.2, 0.2, 0.2, 0.2

# 0) Imports

In [None]:
import pandas as pd
import requests
import scispacy
import spacy
from spacy import displacy


# 1) NER

# 1.A) SciSpacy


Info on the corpus http://bionlp-corpora.sourceforge.net/CRAFT/

- Chemical Entities of Biological Interest
- Cell Ontology
- Entrez Gene
- Gene Ontology (biological process, cellular component, and molecular function)
- NCBI Taxonomy
- Protein Ontology
- Sequence Ontology

In [None]:
import scispacy
import spacy


nlp = spacy.load("en_ner_craft_md")

In [None]:
doc = nlp(inp)

In [None]:
lines = []
for ent in doc.ents:
    lines.append(pd.Series({'entity': ent.text,
                            'start_position': ent.start_char,
                            'end_position': ent.end_char,
                            'entity_type': ent.label_}))
    
    
df = pd.DataFrame(lines)

In [None]:
df.to_csv('entities.csv')

In [None]:
from spacy import displacy

for sent in doc.sents:
    displacy.render(sent, style='ent', jupyter=True)

# 1.B) External API

https://github.com/dmis-lab/bern

In [None]:
def query_raw(text, url="https://bern.korea.ac.kr/plain"):
    return requests.post(url, data={'sample_text': text}).json()

In [None]:
%%time
text = str(next(doc.sents))

res = query_raw(text)

In [None]:
res