# Usage : just run all cells

In [None]:
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.0/en_core_sci_sm-0.2.0.tar.gz

In [1]:
import pandas as pd
import spacy
import scispacy
from scispacy.abbreviation import AbbreviationDetector
from scispacy.umls_linking import UmlsEntityLinker

nlp = spacy.load("en_core_sci_sm")

abbreviation_pipe = AbbreviationDetector(nlp)
nlp.add_pipe(abbreviation_pipe)

In [2]:
# This line takes a while, because we have to download ~1GB of data
# and load a large JSON file (the knowledge base). Be patient!
# Thankfully it should be faster after the first time you use it, because
# the downloads are cached.
# NOTE: The resolve_abbreviations parameter is optional, and requires that
# the AbbreviationDetector pipe has already been added to the pipeline. Adding
# the AbbreviationDetector pipe and setting resolve_abbreviations to True means
# that linking will only be performed on the long form of abbreviations.
linker = UmlsEntityLinker(resolve_abbreviations=True)

nlp.add_pipe(linker)



In [4]:
text = """
Fumonisins, mycotoxins primarily produced by Fusarium verticillioides and Fusarium proliferatum, 
occur predominantly in cereal grains, especially in maize. The European Commission asked EFSA for a 
scientific opinion on the risk to animal health related to fumonisins and their modified and 
hidden forms in feed. Fumonisin B1 (FB1), FB2 and FB3 are the most common forms of fumonisins in 
feedstuffs and thus were included in the assessment. FB1, FB2 and FB3 have the same mode of action and were 
considered as having similar toxicological profile and potencies. For fumonisins, the EFSA Panel on Contaminants 
in the Food Chain (CONTAM) identified no‐observed‐adverse‐effect levels (NOAELs) for cattle, pig, 
poultry (chicken, ducks and turkeys), horse, and lowest‐observed‐adverse‐effect levels (LOAELs) for 
fish (extrapolated from carp) and rabbits. No reference points could be identified for sheep, goats, dogs, 
cats and mink. The dietary exposure was estimated on 18,140 feed samples on FB1–3 representing most of the 
feed commodities with potential presence of fumonisins. Samples were collected between 2003 and 2016 from 19 
different European countries, but most of them from four Member States. To take into account the possible 
occurrence of hidden forms, an additional factor of 1.6, derived from the literature, was applied to the 
occurrence data. Modified forms of fumonisins, for which no data were identified concerning both the occurrence 
and the toxicity, were not included in the assessment. Based on mean exposure estimates, the risk of adverse 
health effects of feeds containing FB1–3 was considered very low for ruminants, low for poultry, horse, rabbits, 
fish and of potential concern for pigs. The same conclusions apply to the sum of FB1–3 and their hidden forms, 
except for pigs for which the risk of adverse health effect was considered of concern.
"""
doc = nlp(text)

print(list(doc.sents))

[
Fumonisins, mycotoxins primarily produced by Fusarium verticillioides and Fusarium proliferatum, 
occur predominantly in cereal grains, especially in maize., The European Commission asked EFSA for a 
scientific opinion on the risk to animal health related to fumonisins and their modified and 
hidden forms in feed., Fumonisin B1 (FB1), FB2 and FB3 are the most common forms of fumonisins in 
feedstuffs and thus were included in the assessment., FB1, FB2 and FB3 have the same mode of action and were 
considered as having similar toxicological profile and potencies., For fumonisins, the EFSA Panel on Contaminants 
in the Food Chain (CONTAM) identified no‐observed‐adverse‐effect levels (NOAELs) for cattle, pig, 
poultry (chicken, ducks and turkeys), horse, and lowest‐observed‐adverse‐effect levels (LOAELs) for 
fish (extrapolated from carp) and rabbits., No reference points could be identified for sheep, goats, dogs, 
cats and mink., The dietary exposure was estimated on 18,140 feed sampl

## extract all "entities" from text

In [5]:
entities=doc.ents
index = range(len(entities))
entities_df=pd.DataFrame(data={'index':index,'entities':entities})
entities_df.head(50)

Unnamed: 0,index,entities
0,0,(Fumonisins)
1,1,(mycotoxins)
2,2,"(Fusarium, verticillioides)"
3,3,"(Fusarium, proliferatum)"
4,4,"(cereal, grains)"
5,5,(maize)
6,6,"(European, Commission, asked, EFSA)"
7,7,(risk)
8,8,(animal)
9,9,(health)


## show all UMLS annotation per entity

In [6]:
entity = doc.ents[46]

print("Name: ", entity)

# Each entity is linked to UMLS with a score
# (currently just char-3gram matching).
for umls_ent in entity._.umls_ents:
	print(linker.umls.cui_to_entity[umls_ent[0]])
	


Name:  cats
CUI: C0007450, Name: Felis catus
Definition: The domestic cat, Felis catus, of the carnivore family FELIDAE, comprising over 30 different breeds. The domestic cat is descended primarily from the wild cat of Africa and extreme southwestern Asia. Though probably present in towns in Palestine as long ago as 7000 years, actual domestication occurred in Egypt about 4000 years ago. (From Walker's Mammals of the World, 6th ed, p801)
TUI(s): T015
Aliases (abbreviated, total: 40): 
	 Felis catus, Felis catus, Felis catus, Felis catus, Cats, Cats, Cats, cats, cats, Cat
CUI: C0325089, Name: Family Felidae
Definition: The cat family in the order CARNIVORA comprised of muscular, deep-chested terrestrial carnivores with a highly predatory lifestyle.
TUI(s): T015
Aliases (abbreviated, total: 19): 
	 Family Felidae, Cat, cat, cat, catting, cats, FAMILY FELIDAE - FELIDS, Felids, Felid, feline
CUI: C1825121, Name: FAM64A gene
Definition: None
TUI(s): T028
Aliases: (total: 8): 
	 FAM64A gene,

In [None]:
from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)