# Usage : just run all cells

In [None]:
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.0/en_core_sci_sm-0.2.0.tar.gz

In [None]:
import pandas as pd
import spacy
import scispacy
from scispacy.abbreviation import AbbreviationDetector
from scispacy.umls_linking import UmlsEntityLinker

nlp = spacy.load("en_core_sci_sm")

abbreviation_pipe = AbbreviationDetector(nlp)

nlp.add_pipe(abbreviation_pipe)

## The following cell might take a while to run (up to 30 minutes) !!

In [None]:
%%timeit

# This line takes a while, because we have to download ~1GB of data
# and load a large JSON file (the knowledge base). Be patient!
# Thankfully it should be faster after the first time you use it, because
# the downloads are cached.
# NOTE: The resolve_abbreviations parameter is optional, and requires that
# the AbbreviationDetector pipe has already been added to the pipeline. Adding
# the AbbreviationDetector pipe and setting resolve_abbreviations to True means
# that linking will only be performed on the long form of abbreviations.

linker = UmlsEntityLinker(resolve_abbreviations=True)

nlp.add_pipe(linker)

## The text to annotate. Can be anything.:

In [None]:
text = """
Fumonisins, mycotoxins primarily produced by Fusarium verticillioides and Fusarium proliferatum, 
occur predominantly in cereal grains, especially in maize. The European Commission asked EFSA for a 
scientific opinion on the risk to animal health related to fumonisins and their modified and 
hidden forms in feed. Fumonisin B1 (FB1), FB2 and FB3 are the most common forms of fumonisins in 
feedstuffs and thus were included in the assessment. FB1, FB2 and FB3 have the same mode of action and were 
considered as having similar toxicological profile and potencies. For fumonisins, the EFSA Panel on Contaminants 
in the Food Chain (CONTAM) identified no‐observed‐adverse‐effect levels (NOAELs) for cattle, pig, 
poultry (chicken, ducks and turkeys), horse, and lowest‐observed‐adverse‐effect levels (LOAELs) for 
fish (extrapolated from carp) and rabbits. No reference points could be identified for sheep, goats, dogs, 
cats and mink. The dietary exposure was estimated on 18,140 feed samples on FB1–3 representing most of the 
feed commodities with potential presence of fumonisins. Samples were collected between 2003 and 2016 from 19 
different European countries, but most of them from four Member States. To take into account the possible 
occurrence of hidden forms, an additional factor of 1.6, derived from the literature, was applied to the 
occurrence data. Modified forms of fumonisins, for which no data were identified concerning both the occurrence 
and the toxicity, were not included in the assessment. Based on mean exposure estimates, the risk of adverse 
health effects of feeds containing FB1–3 was considered very low for ruminants, low for poultry, horse, rabbits, 
fish and of potential concern for pigs. The same conclusions apply to the sum of FB1–3 and their hidden forms, 
except for pigs for which the risk of adverse health effect was considered of concern.
"""
doc = nlp(text)

print(list(doc.sents))

## extract all "entities" from text

This  just annotates "entities" as a single type, "entity"

In [None]:
entities=doc.ents
index = range(len(entities))
entities_df=pd.DataFrame(data={'index':index,'entities':entities})
entities_df.head(50)

## show all UMLS annotation per entity

In [None]:
## This take some seconds to run ....


enity_id_to_inspect = 46  # (use any index from above table)


entity = doc.ents[enity_id_to_inspect]

print("Name: ", entity)

# Each entity is linked to UMLS with a score
# (currently just char-3gram matching).
for umls_ent in entity._.umls_ents:
	print(linker.umls.cui_to_entity[umls_ent[0]])
	


In [None]:
from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)